Methods and compositions for rna-guided treatment of hiv infection

ABSTRACT

A method of preventing transmission of a retrovirus from a mother to her offspring, by treating the mother&#39;s host cells with a composition comprising a Clustered Regularly Interspaced Short Palindromic Repeat (CRISPR)-associated endonuclease, and two or more different guide RNAs (gRNAs), wherein each of the at least two gRNAs is complementary to a different target nucleic acid sequence in a long terminal repeat (LTR) of the proviral DNA, and preventing transmission of the proviral DNA to the offspring.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH

This invention was made with U.S. government support under grant numbersR01MH093271, R01NS087971, and P30MH092177 awarded by the NationalInstitutes of Health. The U.S. government may have certain rights in theinvention.

BACKGROUND OF THE INVENTION 1. Technical Field

The present invention relates to compositions that specifically cleavetarget sequences in retroviruses, for example human immunodeficiencyvirus (HIV). Such compositions, which can include nucleic acids encodinga Clustered Regularly Interspace Short Palindromic Repeat (CRISPR)associated endonuclease and a guide RNA sequence complementary to atarget sequence in a human immunodeficiency virus, can be administeredto a subject having or at risk for contracting an HIV infection.

2. Background Art

For more than three decades since the discovery of HIV-1, AIDS remains amajor public health problem affecting greater than 35.3 million peopleworldwide. AIDS remains incurable due to the permanent integration ofHIV-1 into the host genome. Current therapy (highly activeantiretroviral therapy or HAART) for controlling HIV-1 infection andimpeding AIDS development profoundly reduces viral replication in cellsthat support HIV-1 infection and reduces plasma viremia to a minimallevel. But HAART fails to suppress low level viral genome expression andreplication in tissues and fails to target the latently-infected cells,for example, resting memory T cells, brain macrophages, microglia, andastrocytes, gut-associated lymphoid cells, that serve as a reservoir forHIV-1. Persistent HIV-1 infection is also linked to co-morbiditiesincluding heart and renal diseases, osteopenia, and neurologicaldisorders. There is a continuing need for curative therapeuticstrategies that target persistent viral reservoirs.

SUMMARY OF THE INVENTION

The present invention provides for a method of preventing transmissionof a retrovirus from a mother to her offspring, by treating the mother'shost cells with a composition comprising a Clustered RegularlyInterspaced Short Palindromic Repeat (CRISPR)-associated endonuclease,and two or more different guide RNAs (gRNAs), wherein each of the atleast two gRNAs is complementary to a different target nucleic acidsequence in a long terminal repeat (LTR) of the proviral DNA, andpreventing transmission of the proviral DNA to the offspring.

The details of one or more embodiments of the invention are set forth inthe accompanying drawings and the description below. Other features,objects, and advantages of the invention will be apparent from thedescription and drawings, and from the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

The patent or application file contains at least one drawing executed incolor. Copies of this patent or patent application publication withcolor drawing(s) will be provided by the Office upon request and paymentof the necessary fee.

FIG. 1A, FIG. 1B, FIG. 1C, FIG. 1D, FIG. 1E, FIG. 1F, FIG. 1G, and FIG.1H show that Cas9/LTR-gRNA suppresses HIV-1 reporter virus production inCHME5 microglial cells latently infected with HIV-1. FIG. 1A shows arepresentative gating diagram of EGFP flow cytometry shows a dramaticreduction in TSA-induced reactivation of latent pNL4-3-ΔGag-d2EGFPreporter virus by stably expressed Cas9 plus LTR-A or -B, vs. emptyU6-driven gRNA expression vector (U6-CAG). FIG. 1B shows SURVEYOR Cel-Inuclease assay of PCR product (−453 to +43 within LTR) from selectedLTR-A- or -B-expressing stable clones shows dramatic indel mutationpatterns (arrows). FIG. 1C shows a PCR fragment analysis of a precisedeletion of 190-bp region between LTRs A and B cutting sites (arrowheadand arrow in FIG. 1D), leaving 306-bp fragment (arrow in FIG. 1C)validated by TA-cloning and sequencing results. FIG. 1D discloses SEQ IDNOS 1-3, respectively, in order of appearance. FIG. 1E is a graphshowing subcloning of LTR-A/B stable clones reveals complete loss ofreporter reactivation determined by EGFP flow cytometry, and FIG. 1Fshows elimination of pNL4-3-ΔGag-d2EGFP proviral genome detected bystandard, and FIG. 1G shows real-time PCR amplification of genomic DNAfor EGFP and HIV-1 Rev response element (RRE); β-actin is a DNApurification and loading control. FIG. 1H shows PCR genotyping ofLTR-A/B subclones (#8, 13) using primers to amplify DNA fragmentcovering HIV-1 LTR U3/R/U5 regions (−411 to +129) shows indels (a,deletion; c, insertion) and “intact” or combined LTR (b).

FIG. 2A, FIG. 2B, and FIG. 2C show that Cas9/LTR-gRNA efficientlyeradicates latent HIV-1 virus from U1 monocytic cells. FIG. 2A shows adiagram showing excision of HIV-1 entire genome in chromosome Xp11.4.HIV-1 integration sites were identified using a Genome-Walker link PCRkit. Left, analysis of PCR amplicon lengths using a primer pair (P1/P2)targeting chromosome X integration site-flanking sequence revealselimination of the entire HIV-1 genome (9709-bp), leaving two fragments(833- and 670-bp). FIG. 2B shows TA cloning and sequencing of the LTRfragment (833-bp) showing the host genomic sequence (small letters,226-bp) and the partial sequences (634−27=607 bp) of 5′-LTR (underlinedusing dashes) and 3′-LTR (first underlined section) with a 27-bpdeletion around the LTR-A targeting site (second underlined section).Bottom, two indel alleles identified from 15 sequenced clonal amplicons.The 670-bp fragment consists of a host sequence (226-bp) and theremaining LTR sequence (634−190=444 bp) after 190-bp excision bysimultaneous cutting at LTR-A and B target sites. The underlined andhighlighted sequences indicate the gRNA LTR-A target site and PAM. FIG.2B discloses SEQ ID NOS 4-13, respectively, in order of appearance. FIG.2C shows a functional analysis of LTR-A/B-induced eradication of HIV-1genome, showing substantial blockade of TSA/PMA reactivation-induced p24virion release. U1 cells were transfected with pX260-LTRs-A, -B, or-A/B. After 2-week puromycin selection, cells were treated with TSA (250nM)/PMA for 2 days before p24 Gag ELISA was performed.

FIG. 3A, FIG. 3B, FIG. 3C, and FIG. 3D show that stable expression ofCas9 plus LTR-A/B vaccinates TZM-bl cells against new HIV-1 virusinfection. FIG. 3A shows immunohistochemistry (ICC) and Western blot(WB) analyses with anti-Flag antibody confirm the expression ofFlag-Cas9 in TZM-bl stable clones puromycin (2 μg/ml)-selected for 2weeks. FIG. 3B shows PCR genotyping of Cas9/LTR-A/B stable clones(c1-c7) reveals a close correlation of LTR excision with repression ofLTR luciferase reporter activation. Fold changes representTSA/PMA-induced levels over corresponding non-induction levels. FIG. 3Cshows Cas9/LTR-A/B-expressing cells (c4) were infected withpseudotyped-pNL4-3-Nef-EGFP lentivirus at indicated multiplicity ofinfection (MOI) and infection efficiency measured by EGFP flowcytometry, 2 d post-infection. FIG. 3D shows phase-contrast/fluorescencemicrographs show that LTR-A/B stable, but not control (U6-CAG; black)cells, are resistant to new infection (right panel) by pNL4-3-ΔE-EGFPHIV-1 reporter virus (gray).

FIG. 4A, FIG. 4B, FIG. 4C, and FIG. 4D illustrate the off-target effectsof Cas9/LTR-A/B on the human genome. FIG. 4A is a SURVEYOR assay thatshows no indel mutations in predicted/potential off-target regions inhuman TZM-bl and U1 cells. LTR-A on-target region (A) was used as apositive control and empty U6-CAG vector (U6) as a negative control.FIG. 4B shoes whole-genome sequencing of LTR-A/B stable TZM-bl subcloneshowing the numbers of called indels in the U6-CAG control and LTR-A/Bsamples, FIG. 4C shows detailed information on 10 called indels neargRNA target sites in both samples, and FIG. 4D shows distribution ofoff-target called indels. FIG. 4C discloses SEQ ID NOS 14-15,respectively, in order of appearance.

FIG. 5 shows the LTR U3 sequence of the integrated lentiviralLTR-firefly luciferase reporter identified by TA-cloning and sequencingof PCR product (−411 to −10) from the genomic DNA of human TZM-bl cells.The protospacer and PAM (NGG) sequences of 4 gRNAs (LTR-A to D) and thepredicted binding sites of indicated transcription factors arehighlighted. The precise cleavage sites are marked with scissors. +1indicates the transcriptional start site. FIG. 5 discloses SEQ ID NO:16.

FIG. 6A, FIG. 6B, and FIG. 6C show that LTR-C and LTR-D remarkablysuppress TSA-induced reactivation of latent pNL4-3-ΔGag-d2EGFP virus inCHME5 microglia cells. FIG. 6A is a diagram schematically showingpNL4-3-ΔGag-d2EGFP vector containing Tat, Rev, Env, Vpu, and Nef withthe reporter gene d2EGFP. FIG. 6B shows a SURVEYOR assay showing indelmutations in the on-target LTR genome of Cas9/LTR-D but not Cas9/LTR-Ctransfected cells. FIG. 6C shows a representative gating diagram of EGFPflow cytometry showing a dramatic reduction in TSA-induced reactivationof latent pNL4-3-ΔGag-d2EGFP reporter viruses by stable expression ofCas9/LTR-C or LTR-D as compared with empty U6-driven gRNA expressionvector (U6-CAG).

FIG. 7A, FIG. 7B, FIG. 7C, FIG. 7D, FIG. 7E, and FIG. 7F show that bothLTR-C and LTR-D induced indel mutations and significantly decreasedconstitutive and TSA/PMA-induced luciferase activity in TZM-bl cellsstably incorporated with HIV-1 LTR-firefly luciferase reporter gene.FIG. 7A shows a functional luciferase reporter assay revealing asignificant reduction of LTR reactivation by LTR-C, LTR-D or both. FIG.7B shows a SURVEYOR assay showing indel mutation in LTR DNA (−453 to+43) induced by LTR-C and LTR-D (upper arrow). A combination of LTR-Cand LTR-D generates a 194 bp fragment (lower arrow) resulting from thedeletion of 302 bp region between LTR-C and LTR-D. FIG. 7C and FIG. 7Dshow Sanger sequencing of 30 clones validating the indel efficiency at23% for LTR-C and 13% for LTR-D and example chromatograms showinginsertion/deletion. FIG. 7C discloses SEQ ID NOS 17-25, respectively, inorder of appearance. FIG. 7D discloses SEQ ID NOS 26-30, respectively,in order of appearance. FIG. 7E shows PCR-restriction fragment lengthpolymorphism (RFLP) analysis using BsaJ I to cut 5 sites (96, 102, 372,386, 482) of the PCR product covering −453 to +43 of LTR showing twomajor bands (96 bp and 270 bp) in the U6-CAG control sample, but anadditional 372 bp band (upper arrow) after LTR-C-induced indel mutationat the 96/102 sites, a 290 bp band (middle arrow) after LTR-D-inducedmutations at the 372 site and a 180 bp fragment (lower arrow) afterLTR-C/D-induced excision. FIG. 7F shows chromatograms showing thedeletion of a 302 bp fragment between LTR-C and LTR-D (top) and anadditional 17 bp deletion (bottom). Red arrows indicate the junctionsites. *P<0.05 indicates a significant decrease in LTR-C orLTR-D-mediated luciferase activation compared to U6-CAG control. FIG. 7Fdiscloses SEQ ID NOS 31-32, respectively, in order of appearance.

FIG. 8A, FIG. 8B, and FIG. 8C illustrate the TA cloning and Sangersequencing of PCR products from CHME5 subclones of LTR-A/B and emptyU6-CAG control using primers covering HIV-1 LTR U3/R/U5 regions (−411 to+129). FIG. 8A shows possible combination of LTR-A and LTR-B cuts onboth 5′- and 3′-LTRs generating potential fragments a-c as indicated.FIG. 8B shows blasting of fragment a (351 bp) showing 190 bp deletionbetween LTR-A and LTR-B cut sites. FIG. 8C shows a blast of fragment c(682 bp) showing a 175 bp insertion at the LTR-A cleavage site and a 27bp deletion at the LTR-B cleavage site. FIG. 8C discloses SEQ ID NOS33-34, respectively, in order of appearance.

FIG. 9A, FIG. 9B, FIG. 9C, and FIG. 9D demonstrate that Cas9/LTR-gRNAefficiently eradicates latent HIV-1 virus from U1 monocytic cells. FIG.9A shows a Sanger sequencing of a 1.1 kb fragment from long-range PCRusing a primer pair (T492/T493) targeting a chromosome 2 integrationsite-flanking sequence (small letters, 467-bp) reveals elimination ofthe entire HIV-1 genome (9709-bp), leaving combined 5′-LTR (underlinedusing dashes) and 3′-LTR with a 6-bp insertion (boxed) precisely at thethird nucleotide from PAM (TGG) LTR-A targeting site (underlined) and a4-bp deletion (nnnn). FIG. 9A discloses SEQ ID NO: 35. FIG. 9B is arepresentative DNA gel picture that shows specific eradication of theHIV-1 genome. NS, non-specific band. FIG. 9C is a graph and FIG. 9D is agraph showing quantitative PCR analysis using the primer pair targetingthe Gag gene (T457/T458) shows 85% efficiency of entire HIV-1 genomeeradication in Cas9/LTR-A/B-expressing U1 cells. U1 cells weretransfected with pX260 empty vector (U6-CAG) or LTRs-A/B-encodingvectors. After 2-week puromycin selection, the cellular genomic DNAswere used for absolute quantitative qPCR analysis using spikedpNL4-3-ΔE-EGFP human genomic DNA as a standard. **P<0.01 indicates asignificant decrease compared to the U6-CAG control.

FIG. 10A, FIG. 10B, and FIG. 10C show that Cas9/LTR gRNAs effectivelyeradicates HIV-1 provirus in J-Lat latently infected T cells. FIG. 10Ashows functional analysis by EGFP flow cytometry reveals approximately50% reduction of PMA and TNFα-induced reactivation of EGFP reporterviruses. FIG. 10B is a SURVEYOR assay that shows indel mutations (arrow)in the on-target LTR genome of Cas9/LTR-A/B transfected cells. J-Latcells were transfected with pX260 empty vector or LTRs-A and -B. After2-week puromycin selection, cells were treated with PMA or TNFα for 24h. The genomic DNAs were subject to PCR using primers covering HIV-1 LTRU3/R/U5 regions (−411 to +129) and the SURVEYOR assay was performed.**P<0.01 indicates a significant decrease compared to the U6-CAGcontrol. FIG. 10C shows a PCR fragment analysis using primers coveringHIV-1 LTR (−374 to +43) shows a precise deletion of 190-bp regionbetween LTRs A and B cutting sites, leaving 227-bp fragment (arrow).House-keeping gene β-actin serves as a DNA purification and loadingcontrol.

FIG. 11A, FIG. 11B, FIG. 11C, and FIG. 11D show that genome editingefficiency depends upon the presence of Cas9 and gRNAs. FIG. 11A showsPCR genotyping reveals the absence of a U6-driven LTR-A or LTR-Bexpression cassette and FIG. 11B shows absence/reduction of CMV-drivenCas9 DNA in puromycin-selected TZM-bl subclones without any indicationof genomic editing. Genomic DNAs from indicated subclones were subjectto conventional (FIG. 11A) or real-time (FIG. 11B) PCR analyses using aprimer pair covering U6 promoter (T351) and LTR-A (T354) or -B (T356),and targeting Cas9 (T477/T491). FIG. 11C and FIG. 11D show Cas9 proteinexpression is absent in ineffective TZM-bl subclones. FIG. 11C showsthat the Flag-tagged Cas9 fusion protein was detected by Western blot(WB) and immunocytochemistry (ICC) with anti-Flag monoclonal antibody.HEK293T cell line stably expressing Flag-Cas9 was used as a positivecontrol for WB. GAPDH serves as a protein loading control. Clone c6contains Cas9 DNA but no Cas9 protein expression, suggesting a potentialmechanism of epigenetic repression after puromycin selection. Clone c5and c3 may represent a truncated Flag-Cas9 (tCas9). FIG. 11D shows thatthe nucleus was stained with Hoechst 33258.

FIG. 12A, FIG. 12B, FIG. 12C, and FIG. 12D demonstrate that stableexpression of Cas9/LTR-A/B gRNAs in TZM-bl cells vaccinates againstpseudotyped or native HIV-1 viruses. FIG. 12 shows that flow cytometryshows a significant reduction of native pNL4-3-ΔE-EGFP reporter virusinfection efficiency in Cas9/LTR-A/B expressing TZM-bl subclones.Real-time PCR analysis reveals suppression or elimination of viral RNAas shown in FIG. 12B and DNA as shown in FIG. 12C by Cas9/LTR-A/B gRNAs.FIG. 12D shows that the firefly-luciferase luminescent assaydemonstrates dramatic inhibition of virus infection-stimulated LTRpromoter activity by Cas9/LTR-A/B gRNAs. The stable Cas9/LTR-A/BgRNA-expressing TZM-bl cells were infected for 2 hours with indicatednative HIV-1 viruses, and washed twice with PBS. At 2 dayspost-infection, cells were collected, fixed and analyzed by flowcytometry for EGFP expression (in FIG. 12A), or lysed for total RNAextraction and RT-qPCR (in FIG. 12B), genomic DNA purification for qPCR(in FIG. 12C) and luminescence measurement (in FIG. 12D). *P<0.05 and**P<0.01 indicate significant decreases compared to the U6-CAG control.

FIG. 13 shows the predicted LTR gRNAs and their off-target numbers (100%match). The 5′-LTR sense and antisense sequences (SEQ ID NOS 79-111 and112-141, respectively) (634 bp) of pHR′-CMV-LacZ lentiviral vector(AF105229) were utilized to search for Cas9/gRNA target sites containinga 20-bp guide sequence (protospacer) plus the protospacer adjacent motifsequence (NGG) using Jack Lin's CRISPR/Cas9 gRNA finder tool(http://spot.colorado.edu/˜slin/cas9.html). Each gRNA plus NGG (AGG,TGG, GGG, CGG) was blasted against available human genomic andtranscript sequences with 1000 aligned sequences being displayed. Afterpressing Control+F, copy/paste the target sequence (1-23 through 9-23nucleotides) and find the number of genomic targets with 100% match. Thenumber of off-targets for each searching was divided by 3 because ofrepeated genome library. The number shown indicates the sum of 4searches (NGG). The top number (for example, for gRNA sequence (sense):20, 19, 19, 17, 16, 15, 14, 13, 12) indicates the gRNA target sequencesfarthest from NGG. The sequence and off-target numbers for the selectedLTR-A/B and LTR-C/D are highlighted red and green respectively.

FIG. 14 depicts the oligonucleotides for gRNA targeting sites andprimers (SEQ ID NOS 36-78, respectively, in order of appearance) usedfor PCR and sequencing.

FIG. 15 shows the locations of predicted gRNA targeting sites of LTR-Aand LTR-B and discloses “query Seq” sequences as SEQ ID NOS 142-252, and“ref Seq” sequences as SEQ ID NOS 253-363, all respectively, in order ofappearance.

FIG. 16A, FIG. 16B, FIG. 16C, FIG. 16D, FIG. 16E, FIG. 16F, FIG. 16G,and FIG. 16H show that both LTR-C and LTR-D decreased constitutive andTSA/PMA-induced luciferase activity in TZMBI cells stably incorporatedwith HIV-1 LTR firefly luciferase reporter gene and combination inducedprecise genome excision. FIG. 16A shows that six gRNA targets weredesigned for the promoter region of HIV-LTR. FIG. 16A discloses SEQ IDNO: 16. TZMBI cells were cotransfected with Cas9-EGFP and chimera gRNAexpression cassette (PCR products) by lipofectamine 2000. FIG. 16B is agraph showing that after 3 d, EGFP-positive cells were sorted throughFACS and 2000 cells per group were collected for luciferase assay. FIG.16B discloses SEQ ID: 31. FIG. 16C is a graph showing the populationsorted cells were cultured for 2 d and treated with TSA/PMA for 1 dbefore luciferase assay. The single cells were sorted into 96-well plateand cultured till confluence for luciferase assay in the absence (shownin the graph of FIG. 16D) of TSA/PMA for 1 d or presence (shown in thegraph of FIG. 1E) of TSA/PMA for 1 d. FIG. 16F and FIG. 16G show the PCRproduct from the population sorted cells were analyzed with SurveyorCel-I nuclease assay and restriction fragment length polymorphism withBsajl (FIG. 16G) showing mutation (FIG. 16F) or uncut (FIG. 16G) band(red arrow). A 200 bp fragment (FIG. 16F, FIG. 16G, black arrow)resulting from the deletion of 321 bp region between LTR-C and LTR-D aspredicted (FIG. 16A, red arrowhead) was validated by TA-cloning andsequencing showing precise genomic excision (FIG. 16H). Sangersequencing of PCR products from individual LTR-C and -D identified % and% indel mutation efficiency respectively. * p<0.05 indicatesstatistically significant reduction using a student's t test compared tothe corresponding U6-CAG control. Protospace(E), Protospace(C),Protospace(A), Protospace(B), Protospace(D), and Protospace(F)correspond to SEQ ID NOS 365, 367, 369, 371, 373, and 375, respectively,in order of appearance.

FIG. 17A, FIG. 17B, FIG. 17C, FIG. 17D, FIG. 17E, FIG. 17F, FIG. 17G,and FIG. 17H show that Cas9/LTR-gRNA inhibited constitutive andinducible production of HIV-1 virus measured by EGFP flow cytometry inHIV-1 latently infected CHME5 microglia cell line. The pHR′ lentiviralvector containing Tat, Rev, Env, Vpu, and Nef with the reported gened2EGFP was transduced into human fetal microglia cell line CHME5 and 400bp deletion in U3 region of 3′-LTR is illustrated (shown in FIG. 17A).FIG. 17B is a graph showing transient transfection of Cas9/gRNA, HumanHIV-1 LTR-A, B alone or combination decreased the intensity but notpercentage of EGFP due to suppression of LTR promoter activity. FIG. 17Cis a graph showing transient transfection of Cas9/gRNA, Human HIV-1LTR-C, D alone or combination decreased the intensity but not percentageof EGFP due to suppression of LTR promoter activity. FIG. 17D and FIG.18 are graphs showing that after antibiotic selection for 1-2 weeks, thepercentage of EGFP cells was also reduced. FIG. 17F and FIG. 17G showthe PCR product from the stable selected clones were analyzed withSurveyor Cel-I nuclease assay showing indel mutation dramatically inLTR-A and LTR-B but weakly in the combination of LTR-A/B (red arrow). A331 bp fragment (shown in FIG. 17F and FIG. 17G, black arrow) resultingfrom the deletion of 190 bp region between LTR-A and LTR-B as predicted(FIG. 17H, red arrowhead) was validated by TA-cloning and sequencingshowing precise genomic excision (FIG. 17H). FIG. 17H discloses SEQ IDNOS 1-3, respectively, in order of appearance.

FIG. 18 shows LTR of a representative HIV-1 sequence (SEQ ID NO: 376).The U3 region extends from nucleotide 1 to nucleotide 432 (SEQ ID NO:377), the R region extends from nucleotide 432 to nucleotide 559 (SEQ IDNO: 378), and the U5 region extends from 560 to nucleotide 634 (SEQ IDNO: 379).

FIG. 19 shows LTR of a representative SIV sequence (SEQ ID NO: 380). TheU3 region extends from nucleotide 1 to nucleotide 517 (SEQ ID NO: 381),the R region extends from nucleotide 518 to nucleotide 693 (SEQ ID NO:382), and the U5 region extends from 694 to nucleotide 818 (SEQ ID NO:383).

DETAILED DESCRIPTION OF THE INVENTION

The present invention is based, in part, on our discovery that we couldeliminate the integrated HIV-1 genome from HIV-1 infected cells by usingthe RNA-guided Clustered Regularly Interspace Short Palindromic Repeat(CRISPR)-Cas 9 nuclease system (Cas9/gRNA) in single and multiplexconfigurations. We identified highly specific targets within the HIV-1LTR U3 region that were efficiently edited by Cas9/gRNA, inactivatingviral gene expression and replication in latently-infected microglial,promonocytic and T cells. Cas9/gRNAs caused neither genotoxicity noroff-target editing to the host cells, and completely excised a 9709-bpfragment of integrated proviral DNA that spanned from its 5′- to3′-LTRs. Furthermore, the presence of multiplex gRNAs withinCas9-expressing cells prevented HIV-1 infection. Our results suggestthat Cas9/gRNA can be engineered to provide a specific, efficaciousprophylactic and therapeutic approach against AIDS.

Accordingly, the invention features compositions comprising a nucleicacid encoding a CRISPR-associated endonuclease and a guide RNA that iscomplementary to a target sequence in a retrovirus, e.g., HIV, as wellas pharmaceutical formulations comprising a nucleic acid encoding aCRISPR-associated endonuclease and a guide RNA that is complementary toa target sequence in HIV. Also featured are compositions comprising aCRISPR-associated endonuclease polypeptide and a guide RNA that iscomplementary to a target sequence in HIV, as well as pharmaceuticalformulations comprising a CRISPR-associated endonuclease polypeptide anda guide RNA that is complementary to a target sequence in HIV.

Also featured are methods of administering the compositions to treat aretroviral infection, e.g., HIV infection, methods of eliminating viralreplication, and methods of preventing HIV infection. The therapeuticmethods described herein can be carried out in connection with otherantiretroviral therapies (e.g., HAART).

The clinical course of HIV infection can vary according to a number offactors, including the subject's genetic background, age, generalhealth, nutrition, treatment received, and the HIV subtype. In general,most individuals develop flu-like symptoms within a few weeks or monthsof infection. The symptoms can include fever, headache, muscle aches,rash, chills, sore throat, mouth or genital ulcers, swollen lymphglands, joint pain, night sweats, and diarrhea. The intensity of thesymptoms can vary from mild to severe depending upon the individual.During the acute phase, the HIV viral particles are attracted to andenter cells expressing the appropriate CD4 receptor molecules. Once thevirus has entered the host cell, the HIV encoded reverse transcriptasegenerates a proviral DNA copy of the HIV RNA and the pro-viral DNAbecomes integrated into the host cell genomic DNA. It is this HIVprovirus that is replicated by the host cell, resulting in the releaseof new HIV virions which can then infect other cells. The methods andcompositions of the invention are generally and variously useful forexcision of integrated HIV proviral DNA, although the invention is notso limited, and the compositions may be administered to a subject at anystage of infection or to an uninfected subject who is at risk for HIVinfection.

The primary HIV infection subsides within a few weeks to a few months,and is typically followed by a long clinical “latent” period which maylast for up to 10 years. The latent period is also referred to asasymptomatic HIV infection or chronic HIV infection. The subject's CD4lymphocyte numbers rebound, but not to pre-infection levels and mostsubjects undergo seroconversion, that is, they have detectable levels ofanti-HIV antibody in their blood, within 2 to 4 weeks of infection.During this latent period, there can be no detectable viral replicationin peripheral blood mononuclear cells and little or no culturable virusin peripheral blood. During the latent period, also referred to as theclinical latency stage, people who are infected with HIV may experienceno HIV-related symptoms, or only mild ones. But, the HIV virus continuesto reproduce at very low levels. In subjects who have treated withanti-retroviral therapies, this latent period may extend for severaldecades or more. However, subjects at this stage are still able totransmit HIV to others even if they are receiving antiretroviraltherapy, although anti-retroviral therapy reduces the risk oftransmission. As noted above, anti-retroviral therapy does not suppresslow levels of viral genome expression nor does it efficiently targetlatently infected cells such as resting memory T cells, brainmacrophages, microglia, astrocytes and gut associated lymphoid cells.

Clinical signs and symptoms of AIDS (acquired immunodeficiency syndrome)appear as CD4 lymphocyte numbers decrease, resulting in irreversibledamage to the immune system. Many patients also present withAIDS-related complications, including, for example, opportunisticinfections such as tuberculosis, salmonellosis, cytomegalovirus,candidiasis, cryptococcal meningitis, toxoplasmosis, andcryptosporidiosis, as well as certain kinds of cancers, including forexample, Kaposi's sarcoma, and lymphomas, as well as wasting syndrome,neurological complications, and HIV-associated nephropathy.

Compositions

The compositions of the invention include nucleic acids encoding aCRISPR-associated endonuclease, e.g., Cas9, and a guide RNA that iscomplementary to a target sequence in a retrovirus, e.g., HIV. Inbacteria the CRISPR/Cas loci encode RNA-guided adaptive immune systemsagainst mobile genetic elements (viruses, transposable elements andconjugative plasmids). Three types (I-III) of CRISPR systems have beenidentified. CRISPR clusters contain spacers, the sequences complementaryto antecedent mobile elements. CRISPR clusters are transcribed andprocessed into mature CRISPR (Clustered Regularly Interspaced ShortPalindromic Repeats) RNA (crRNA). The CRISPR-associated endonuclease,Cas9, belongs to the type II CRISPR/Cas system and has strongendonuclease activity to cut target DNA. Cas9 is guided by a maturecrRNA that contains about 20 base pairs (bp) of unique target sequence(called spacer) and a trans-activated small RNA (tracrRNA) that servesas a guide for ribonuclease III-aided processing of pre-crRNA. ThecrRNA:tracrRNA duplex directs Cas9 to target DNA via complementary basepairing between the spacer on the crRNA and the complementary sequence(called protospacer) on the target DNA. Cas9 recognizes a trinucleotide(NGG) protospacer adjacent motif (PAM) to specify the cut site (the 3rdnucleotide from PAM). The crRNA and tracrRNA can be expressed separatelyor engineered into an artificial fusion small guide RNA (sgRNA) via asynthetic stem loop (AGAAAU) to mimic the natural crRNA/tracrRNA duplex.Such sgRNA, like shRNA, can be synthesized or in vitro transcribed fordirect RNA transfection or expressed from U6 or Hi-promoted RNAexpression vector, although cleavage efficiencies of the artificialsgRNA are lower than those for systems with the crRNA and tracrRNAexpressed separately.

The compositions of the invention can include a nucleic acid encoding aCRISPR-associated endonuclease. In some embodiments, theCRISPR-associated endonuclease can be a Cas9 nuclease. The Cas9 nucleasecan have a nucleotide sequence identical to the wild type Streptococcuspyrogenes sequence. In some embodiments, the CRISPR-associatedendonuclease can be a sequence from other species, for example otherStreptococcus species, such as thermophilus; Psuedomona aeruginosa,Escherichia coli, or other sequenced bacteria genomes and archaea, orother prokaryotic microorganisms. Alternatively, the wild typeStreptococcus pyrogenes Cas9 sequence can be modified. The nucleic acidsequence can be codon optimized for efficient expression in mammaliancells, i.e., “humanized.” A humanized Cas9 nuclease sequence can be forexample, the Cas9 nuclease sequence encoded by any of the expressionvectors listed in Genbank accession numbers KM099231.1 GI:669193757;KM099232.1 GI:669193761; or KM099233.1 GI:669193765. Alternatively, theCas9 nuclease sequence can be for example, the sequence contained withina commercially available vector such as PX330 or PX260 from Addgene(Cambridge, Mass.). In some embodiments, the Cas9 endonuclease can havean amino acid sequence that is a variant or a fragment of any of theCas9 endonuclease sequences of Genbank accession numbers KM099231.1GI:669193757; KM099232.1 GI:669193761; or KM099233.1 GI:669193765 orCas9 amino acid sequence of PX330 or PX260 (Addgene, Cambridge, Mass.).The Cas9 nucleotide sequence can be modified to encode biologicallyactive variants of Cas9, and these variants can have or can include, forexample, an amino acid sequence that differs from a wild type Cas9 byvirtue of containing one or more mutations (e.g., an addition, deletion,or substitution mutation or a combination of such mutations). One ormore of the substitution mutations can be a substitution (e.g., aconservative amino acid substitution). For example, a biologicallyactive variant of a Cas9 polypeptide can have an amino acid sequencewith at least or about 50% sequence identity (e.g., at least or about50%, 55%, 60%, 65%, 70%, 75%, 80%, 85%, 90%, 95%, 97%, 98%, or 99%sequence identity) to a wild type Cas9 polypeptide. Conservative aminoacid substitutions typically include substitutions within the followinggroups: glycine and alanine; valine, isoleucine, and leucine; asparticacid and glutamic acid; asparagine, glutamine, serine and threonine;lysine, histidine and arginine; and phenylalanine and tyrosine. Theamino acid residues in the Cas9 amino acid sequence can be non-naturallyoccurring amino acid residues. Naturally occurring amino acid residuesinclude those naturally encoded by the genetic code as well asnon-standard amino acids (e.g., amino acids having the D-configurationinstead of the L-configuration). The present peptides can also includeamino acid residues that are modified versions of standard residues(e.g. pyrrolysine can be used in place of lysine and selenocysteine canbe used in place of cysteine). Non-naturally occurring amino acidresidues are those that have not been found in nature, but that conformto the basic formula of an amino acid and can be incorporated into apeptide. These include D-alloisoleucine(2R,3S)-2-amino-3-methylpentanoicacid and L-cyclopentyl glycine (S)-2-amino-2-cyclopentyl acetic acid.For other examples, one can consult textbooks or the worldwide web (asite is currently maintained by the California Institute of Technologyand displays structures of non-natural amino acids that have beensuccessfully incorporated into functional proteins).

The Cas9 nuclease sequence can be a mutated sequence. For example theCas9 nuclease can be mutated in the conserved HNH and RuvC domains,which are involved in strand specific cleavage. For example, anaspartate-to-alanine (D10A) mutation in the RuvC catalytic domain allowsthe Cas9 nickase mutant (Cas9n) to nick rather than cleave DNA to yieldsingle-stranded breaks, and the subsequent preferential repair throughHDR can potentially decrease the frequency of unwanted indel mutationsfrom off-target double-stranded breaks.

In some embodiments, compositions of the invention can include aCRISPR-associated endonuclease polypeptide encoded by any of the nucleicacid sequences described above. The terms “peptide,” “polypeptide,” and“protein” are used interchangeably herein, although typically they referto peptide sequences of varying sizes. We may refer to the aminoacid-based compositions of the invention as “polypeptides” to conveythat they are linear polymers of amino acid residues, and to helpdistinguish them from full-length proteins. A polypeptide of theinvention can “constitute” or “include” a fragment of aCRISPR-associated endonuclease, and the invention encompassespolypeptides that constitute or include biologically active variants ofa CRISPR-associated endonuclease. It will be understood that thepolypeptides can therefore include only a fragment of aCRISPR-associated endonuclease (or a biologically active variantthereof) but may include additional residues as well. Biologicallyactive variants will retain sufficient activity to cleave target DNA.

The bonds between the amino acid residues can be conventional peptidebonds or another covalent bond (such as an ester or ether bond), and thepolypeptides can be modified by amidation, phosphorylation orglycosylation. A modification can affect the polypeptide backbone and/orone or more side chains. Chemical modifications can be naturallyoccurring modifications made in vivo following translation of an mRNAencoding the polypeptide (e.g., glycosylation in a bacterial host) orsynthetic modifications made in vitro. A biologically active variant ofa CRISPR-associated endonuclease can include one or more structuralmodifications resulting from any combination of naturally occurring(i.e., made naturally in vivo) and synthetic modifications (i.e.,naturally occurring or non-naturally occurring modifications made invitro). Examples of modifications include, but are not limited to,amidation (e.g., replacement of the free carboxyl group at theC-terminus by an amino group); biotinylation (e.g., acylation of lysineor other reactive amino acid residues with a biotin molecule);glycosylation (e.g., addition of a glycosyl group to either asparagines,hydroxylysine, serine or threonine residues to generate a glycoproteinor glycopeptide); acetylation (e.g., the addition of an acetyl group,typically at the N-terminus of a polypeptide); alkylation (e.g., theaddition of an alkyl group); isoprenylation (e.g., the addition of anisoprenoid group); lipoylation (e.g. attachment of a lipoate moiety);and phosphorylation (e.g., addition of a phosphate group to serine,tyrosine, threonine or histidine).

One or more of the amino acid residues in a biologically active variantmay be a non-naturally occurring amino acid residue. Naturally occurringamino acid residues include those naturally encoded by the genetic codeas well as non-standard amino acids (e.g., amino acids having theD-configuration instead of the L-configuration). The present peptidescan also include amino acid residues that are modified versions ofstandard residues (e.g. pyrrolysine can be used in place of lysine andselenocysteine can be used in place of cysteine). Non-naturallyoccurring amino acid residues are those that have not been found innature, but that conform to the basic formula of an amino acid and canbe incorporated into a peptide. These includeD-alloisoleucine(2R,3S)-2-amino-3-methylpentanoic acid and L-cyclopentylglycine (S)-2-amino-2-cyclopentyl acetic acid. For other examples, onecan consult textbooks or the worldwide web (a site is currentlymaintained by the California Institute of Technology and displaysstructures of non-natural amino acids that have been successfullyincorporated into functional proteins).

Alternatively, or in addition, one or more of the amino acid residues ina biologically active variant can be a naturally occurring residue thatdiffers from the naturally occurring residue found in the correspondingposition in a wildtype sequence. In other words, biologically activevariants can include one or more amino acid substitutions. We may referto a substitution, addition, or deletion of amino acid residues as amutation of the wildtype sequence. As noted, the substitution canreplace a naturally occurring amino acid residue with a non-naturallyoccurring residue or just a different naturally occurring residue.Further the substitution can constitute a conservative ornon-conservative substitution. Conservative amino acid substitutionstypically include substitutions within the following groups: glycine andalanine; valine, isoleucine, and leucine; aspartic acid and glutamicacid; asparagine, glutamine, serine and threonine; lysine, histidine andarginine; and phenylalanine and tyrosine.

The polypeptides that are biologically active variants of aCRISPR-associated endonuclease can be characterized in terms of theextent to which their sequence is similar to or identical to thecorresponding wild-type polypeptide. For example, the sequence of abiologically active variant can be at least or about 80% identical tocorresponding residues in the wild-type polypeptide. For example, abiologically active variant of a CRISPR-associated endonuclease can havean amino acid sequence with at least or about 80% sequence identity(e.g., at least or about 85%, 90%, 95%, 97%, 98%, or 99% sequenceidentity) to a CRISPR-associated endonuclease or to a homolog orortholog thereof.

A biologically active variant of a CRISPR-associated endonucleasepolypeptide will retain sufficient biological activity to be useful inthe present methods. The biologically active variants will retainsufficient activity to function in targeted DNA cleavage. The biologicalactivity can be assessed in ways known to one of ordinary skill in theart and includes, without limitation, in vitro cleavage assays orfunctional assays.

Polypeptides can be generated by a variety of methods including, forexample, recombinant techniques or chemical synthesis. Once generated,polypeptides can be isolated and purified to any desired extent by meanswell known in the art. For example, one can use lyophilizationfollowing, for example, reversed phase (preferably) or normal phaseHPLC, or size exclusion or partition chromatography on polysaccharidegel media such as Sephadex G-25. The composition of the finalpolypeptide may be confirmed by amino acid analysis after degradation ofthe peptide by standard means, by amino acid sequencing, or by FAB-MStechniques. Salts, including acid salts, esters, amides, and N-acylderivatives of an amino group of a polypeptide may be prepared usingmethods known in the art, and such peptides are useful in the context ofthe present invention.

The compositions of the invention include sequence encoding a guide RNA(gRNA) comprising a sequence that is complementary to a target sequencein a retrovirus. The retrovirus can be a lentivirus, for example, ahuman immunodeficiency virus, a simian immunodeficiency virus, a felineimmunodeficiency virus or a bovine immunodeficiency virus. The humanimmunodeficiency virus can be HIV-1 or HIV-2. The target sequence caninclude a sequence from any HIV, for example, HIV-1 and HIV-2, and anycirculating recombinant form thereof. The genetic variability of HIV isreflected in the multiple groups and subtypes that have been described.A collection of HIV sequences is compiled in the Los Alamos HIVdatabases and compendiums. The methods and compositions of the inventioncan be applied to HIV from any of those various groups, subtypes, andcirculating recombinant forms. These include for example, the HIV-1major group (often referred to as Group M) and the minor groups, GroupsN, O, and P, as well as but not limited to, any of the followingsubtypes, A, B, C, D, F, G, H, J and K. or group (for example, but notlimited to any of the following Groups, N, O and P) of HIV. The methodsand compositions can also be applied to HIV-2 and any of the A, B, C, For G clades (also referred to as “subtypes” or “groups”), as well as anycirculating recombinant form of HIV-2.

The guide RNA can be a sequence complimentary to a coding or anon-coding sequence. For example, the guide RNA can be an HIV sequence,such as a long terminal repeat (LTR) sequence, a protein codingsequence, or a regulatory sequence. In some embodiments, the guide RNAcomprises a sequence that is complementary to an HIV long terminalrepeat (LTR) region. The HIV-1 LTR is approximately 640 bp in length. Anexemplary HIV-1 LTR is the sequence of SEQ ID NO: 376. An exemplary SIVLTR is the sequence of SEQ ID NO: 380. HIV-1 long terminal repeats(LTRs) are divided into U3, R and U5 regions. Exemplary HIV-1 LTR U3, Rand U5 regions are SEQ ID NOs: 377, 378 and 379, respectively. ExemplarySIV LTR U3, R and U5 regions are SEQ ID NOs: 381, 382, and 383,respectively. The configuration of the U1, R, U5 regions for exemplaryHIV-1 and SIV sequences are shown in FIGS. 18 and 19, respectively. LTRscontain all of the required signals for gene expression and are involvedin the integration of a provirus into the genome of a host cell. Forexample, the basal or core promoter, a core enhancer and a modulatoryregion is found within U3 while the transactivation response element isfound within R. In HIV-1, the U5 region includes several sub-regions,for example, TAR or trans-acting responsive element, which is involvedin transcriptional activation; Poly A, which is involved in dimerizationand genome packaging; PBS or primer binding site; Psi or the packagingsignal; DIS or dimer initiation site

Useful guide sequences are complementary to the U3, R, or U5 region ofthe LTR. Exemplary guide RNA sequences that target the U3 region ofHIV-1 are shown in FIG. 13. A guide RNA sequence can comprise, forexample, a sequence complementary to the target protospacer sequence of:

(SEQ ID NO: 96) LTR A: ATCAGATATCCACTGACCTTTGG, (SEQ ID NO: 121)LTR B: CAGCAGTTCTTGAAGTACTCCGG, (SEQ ID NO: 87)LTR C: GATTGGCAGAACTACACACCAGG, or (SEQ ID NO: 110)LTR D: GCGTGGCCTGGGCGGGACTGGGG.

The locations of LTR A (SEQ ID NO: 96), LTR B (SEQ ID NO: 121), LTR C(SEQ ID NO: 87) and LTR D (SEQ ID NO: 110) within the U3 (SEQ ID NO: 16)region are shown FIG. 5. Additional exemplary guide RNA sequences thattarget the U3 region are listed in the table shown in FIG. 13 and canhave the sequence of any of SEQ ID NOs: 79-111 and SEQ ID NOs: 111-141.In some embodiments, the guide sequence can comprise a sequence having95% identity to any of SEQ ID NOs: 79-111 and SEQ ID NOs: 111-141. Thus,a guide RNA sequence can comprise, for example, a sequence having 95%identity to a sequence complementary to the target protospacer sequenceof:

(SEQ ID NO: 96) LTR A: ATCAGATATCCACTGACCTTTGG, (SEQ ID NO: 121)LTR B: CAGCAGTTCTTGAAGTACTCCGG, (SEQ ID NO: 87)LTR C: GATTGGCAGAACTACACACCAGG, or (SEQ ID NO: 110)LTR D: GCGTGGCCTGGGCGGGACTGGGG.

We may also be refer to the guide RNA sequence as a spacer, e.g., spacer(A), spacer (B), spacer (C), and spacer (D).

The guide RNA sequence can be complementary to a sequence found withinan HIV-1 U3, R, or U5 region reference sequence or consensus sequence.The invention is not so limiting however, and the guide RNA sequencescan be selected to target any variant or mutant HIV sequence. In someembodiments, more than one guide RNA sequence is employed, for example afirst guide RNA sequence and a second guide RNA sequence, with the firstand second guide RNA sequences being complimentary to target sequencesin any of the above mentioned retroviral regions. In some embodiments,the guide RNA can include a variant sequence or quasi-species sequence.In some embodiments, the guide RNA can be a sequence corresponding to asequence in the genome of the virus harbored by the subject undergoingtreatment. Thus for example, the sequence of the particular U3, R, or U5region in the HIV virus harbored by the subject can be obtained andguide RNAs complementary to the patient's particular sequences can beused.

In some embodiments, the guide RNA can be a sequence complimentary to aprotein coding sequence, for example, a sequence encoding one or moreviral structural proteins, (e.g., gag, pol, env and tat). Thus, thesequence can be complementary to sequence within the gag polyprotein,e.g., MA (matrix protein, p17); CA (capsid protein, p24); SP1 (spacerpeptide 1, p2); NC (nucleocapsid protein, p7); SP2 (spacer peptide 2,p1) and P6 protein; pol, e.g., reverse transcriptase (RT) and RNase H,integrase (IN), and HIV protease (PR); env, e.g., gp160, or a cleavageproduct of gp160, e.g., gp120 or SU, and gp41 or TM; or tat, e.g., the72-amino acid one-exon Tat or the 86-101 amino-acid two-exon Tat. Insome embodiments, the guide RNA can be a sequence complementary to asequence encoding an accessory protein, including for example, vif, nef(negative factor) vpu (Virus protein U) and tev.

In some embodiments, the sequence can be a sequence complementary to astructural or regulatory element, for example, an LTR, as describedabove; TAR (Target sequence for viral transactivation), the binding sitefor Tat protein and for cellular proteins, consists of approximately thefirst 45 nucleotides of the viral mRNAs in HIV-1 (or the first 100nucleotides in HIV-2) forms a hairpin stem-loop structure; RRE (Revresponsive element) an RNA element encoded within the env region ofHIV-1, consisting of approximately 200 nucleotides (positions 7710 to8061 from the start of transcription in HIV-1, spanning the border ofgp120 and gp41); PE (Psi element), a set of 4 stem-loop structurespreceding and overlapping the Gag start codon; SLIP, a TTTTTT “slipperysite”, followed by a stem-loop structure; CRS (Cis-acting repressivesequences); INS Inhibitory/Instability RNA sequences) found for example,at nucleotides 414 to 631 in the gag region of HIV-1.

The guide RNA sequence can be a sense or anti-sense sequence. The guideRNA sequence generally includes a proto-spacer adjacent motif (PAM). Thesequence of the PAM can vary depending upon the specificity requirementsof the CRISPR endonuclease used. In the CRISPR-Cas system derived fromS. pyogenes, the target DNA typically immediately precedes a 5′-NGGproto-spacer adjacent motif (PAM). Thus, for the S. pyogenes Cas9, thePAM sequence can be AGG, TGG, CGG or GGG. Other Cas9 orthologs may havedifferent PAM specificities. For example, Cas9 from S. thermophilusrequires 5′-NNAGAA for CRISPR 1 and 5′-NGGNG for CRISPR3) and Neiseriamenigiditis requires 5′-NNNNGATT). The specific sequence of the guideRNA may vary, but, regardless of the sequence, useful guide RNAsequences will be those that minimize off-target effects while achievinghigh efficiency and complete ablation of the genomically integratedHIV-1 provirus. The length of the guide RNA sequence can vary from about20 to about 60 or more nucleotides, for example about 20, about 21,about 22, about 23, about 24, about 25, about 26, about 27, about 28,about 29, about 30, about 31, about 32, about 33, about 34, about 35,about 36, about 37, about 38, about 39, about 40, about 45, about 50,about 55, about 60 or more nucleotides. Useful selection methodsidentify regions having extremely low homology between the foreign viralgenome and host cellular genome including endogenous retroviral DNA,include bioinformatic screening using 12-bp+NGG target-selectioncriteria to exclude off-target human transcriptome or (even rarely)untranslated-genomic sites; avoiding transcription factor binding siteswithin the HIV-1 LTR promoter (potentially conserved in the hostgenome); selection of LTR-A- and -B-directed, 30-bp gRNAs and alsopre-crRNA system reflecting the original bacterial immune mechanism toenhance specificity/efficiency vs. 20-bp gRNA-, chimericcrRNA-tracRNA-based system and WGS, Sanger sequencing and SURVEYORassay, to identify and exclude potential off-target effects.

The guide RNA sequence can be configured as a single sequence or as acombination of one or more different sequences, e.g., a multiplexconfiguration. Multiplex configurations can include combinations of two,three, four, five, six, seven, eight, nine, ten, or more different guideRNAs, for example any combination of sequences in U3, R, or U5. In someembodiments, combinations of LTR A, LTR B, LTR C and LTR D can be used.In some embodiments, combinations of any of the sequences LTR A (SEQ IDNO: 96), LTR B (SEQ ID NO: 121), LTR C (SEQ ID NO: 87), and LTR D (SEQID NO: 110), can be used. In some embodiments, any combinations of thesequences having the sequence of SEQ ID NOs: 79-111 and SEQ ID NOs:111-141 can be used. When the compositions are administered in anexpression vector, the guide RNAs can be encoded by a single vector.Alternatively, multiple vectors can be engineered to each include two ormore different guide RNAs. Useful configurations will result in theexcision of viral sequences between cleavage sites resulting in theablation of HIV genome or HIV protein expression. Thus, the use of twoor more different guide RNAs promotes excision of the viral sequencesbetween the cleavage sites recognized by the CRISPR endonuclease. Theexcised region can vary in size from a single nucleotide to severalthousand nucleotides. Exemplary excised regions are described in theexamples.

When the compositions are administered as a nucleic acid or arecontained within an expression vector, the CRISPR endonuclease can beencoded by the same nucleic acid or vector as the guide RNA sequences.Alternatively or in addition, the CRISPR endonuclease can be encoded ina physically separate nucleic acid from the guide RNA sequences or in aseparate vector.

In some embodiments, the RNA molecules e.g. crRNA, tracrRNA, gRNA areengineered to comprise one or more modified nucleobases. For example,known modifications of RNA molecules can be found, for example, in GenesVI, Chapter 9 (“Interpreting the Genetic Code”), Lewis, ed. (1997,Oxford University Press, New York), and Modification and Editing of RNA,Grosjean and Benne, eds. (1998, ASM Press, Washington D.C.). ModifiedRNA components include the following: 2′-O-methylcytidine;N⁴-methylcytidine; N⁴-2′-O-dimethylcytidine; N⁴-acetylcytidine;5-methylcytidine; 5,2′-O-dimethylcytidine; 5-hydroxymethylcytidine;5-formylcytidine; 2′-O-methyl-5-formaylcytidine; 3-methylcytidine;2-thiocytidine; lysidine; 2′-O-methyluridine; 2-thiouridine;2-thio-2′-O-methyluridine; 3,2′-O-dimethyluridine;3-(3-amino-3-carboxypropyl)uridine; 4-thiouridine; ribosylthymine;5,2′-O-dimethyluridine; 5-methyl-2-thiouridine; 5-hydroxyuridine;5-methoxyuridine; uridine 5-oxyacetic acid; uridine 5-oxyacetic acidmethyl ester; 5-carboxymethyluridine; 5-methoxycarbonylmethyluridine;5-methoxycarbonylmethyl-2′-O-methyluridine;5-methoxycarbonylmethyl-2′-thiouridine; 5-carbamoylmethyluridine;5-carbamoylmethyl-2′-O-methyluridine; 5-(carboxyhydroxymethyl)uridine;5-(carboxyhydroxymethyl) uridinemethyl ester;5-aminomethyl-2-thiouridine; 5-methylaminomethyluridine;5-methylaminomethyl-2-thiouridine; 5-methylaminomethyl-2-selenouridine;5-carboxymethylaminomethyluridine;5-carboxymethylaminomethyl-2′-O-methyl-uridine;5-carboxymethylaminomethyl-2-thiouridine; dihydrouridine;dihydroribosylthymine; 2′-methyladenosine; 2-methyladenosine;N.sup.6N-methyladenosine; N⁶,N⁶-dimethyladenosine;N⁶,2′-O-trimethyladenosine; 2-methylthio-N⁶N-isopentenyladenosine;N⁶-(cis-hydroxyisopentenyl)-adenosine;2-methylthio-N⁶-(cis-hydroxyisopentenyl)-adenosine;N⁶-glycinylcarbamoyl)adenosine; N⁶-threonylcarbamoyl adenosine;N⁶-methyl-N⁶-threonylcarbamoyl adenosine;2-methylthio-N⁶-methyl-N⁶-threonylcarbamoyl adenosine;N⁶-hydroxynorvalylcarbamoyl adenosine;2-methylthio-N⁶-hydroxnorvalylcarbamoyl adenosine; 2′-O-ribosyladenosine(phosphate); inosine; 2′O-methyl inosine; 1-methyl inosine;1;2′-O-dimethyl inosine; 2′-O-methyl guanosine; 1-methyl guanosine;N²-methyl guanosine; N²,N²-dimethyl guanosine; N², 2′-O-dimethylguanosine; N²,N², 2′-O-trimethyl guanosine; 2′-O-ribosyl guanosine(phosphate); 7-methyl guanosine; N²;7-dimethyl guanosine; N²;N²;7-trimethyl guanosine; wyosine; methylwyosine; under-modifiedhydroxywybutosine; wybutosine; hydroxywybutosine; peroxywybutosine;queuosine; epoxyqueuosine; galactosyl-queuosine; mannosyl-queuosine;7-cyano-7-deazaguanosine; arachaeosine [also called7-formamido-7-deazaguanosine]; and 7-aminomethyl-7-deazaguanosine.

We may use the terms “nucleic acid” and “polynucleotide” interchangeablyto refer to both RNA and DNA, including cDNA, genomic DNA, syntheticDNA, and DNA (or RNA) containing nucleic acid analogs, any of which mayencode a polypeptide of the invention and all of which are encompassedby the invention. Polynucleotides can have essentially anythree-dimensional structure. A nucleic acid can be double-stranded orsingle-stranded (i.e., a sense strand or an antisense strand).Non-limiting examples of polynucleotides include genes, gene fragments,exons, introns, messenger RNA (mRNA) and portions thereof, transfer RNA,ribosomal RNA, siRNA, micro-RNA, ribozymes, cDNA, recombinantpolynucleotides, branched polynucleotides, plasmids, vectors, isolatedDNA of any sequence, isolated RNA of any sequence, nucleic acid probes,and primers, as well as nucleic acid analogs. In the context of thepresent invention, nucleic acids can encode a fragment of a naturallyoccurring Cas9 or a biologically active variant thereof and a guide RNAwhere in the guide RNA is complementary to a sequence in HIV.

An “isolated” nucleic acid can be, for example, a naturally-occurringDNA molecule or a fragment thereof, provided that at least one of thenucleic acid sequences normally found immediately flanking that DNAmolecule in a naturally-occurring genome is removed or absent. Thus, anisolated nucleic acid includes, without limitation, a DNA molecule thatexists as a separate molecule, independent of other sequences (e.g., achemically synthesized nucleic acid, or a cDNA or genomic DNA fragmentproduced by the polymerase chain reaction (PCR) or restrictionendonuclease treatment). An isolated nucleic acid also refers to a DNAmolecule that is incorporated into a vector, an autonomously replicatingplasmid, a virus, or into the genomic DNA of a prokaryote or eukaryote.In addition, an isolated nucleic acid can include an engineered nucleicacid such as a DNA molecule that is part of a hybrid or fusion nucleicacid. A nucleic acid existing among many (e.g., dozens, or hundreds tomillions) of other nucleic acids within, for example, cDNA libraries orgenomic libraries, or gel slices containing a genomic DNA restrictiondigest, is not an isolated nucleic acid.

Isolated nucleic acid molecules can be produced by standard techniques.For example, polymerase chain reaction (PCR) techniques can be used toobtain an isolated nucleic acid containing a nucleotide sequencedescribed herein, including nucleotide sequences encoding a polypeptidedescribed herein. PCR can be used to amplify specific sequences from DNAas well as RNA, including sequences from total genomic DNA or totalcellular RNA. Various PCR methods are described in, for example, PCRPrimer: A Laboratory Manual, Dieffenbach and Dveksler, eds., Cold SpringHarbor Laboratory Press, 1995. Generally, sequence information from theends of the region of interest or beyond is employed to designoligonucleotide primers that are identical or similar in sequence toopposite strands of the template to be amplified. Various PCR strategiesalso are available by which site-specific nucleotide sequencemodifications can be introduced into a template nucleic acid.

Isolated nucleic acids also can be chemically synthesized, either as asingle nucleic acid molecule (e.g., using automated DNA synthesis in the3′ to 5′ direction using phosphoramidite technology) or as a series ofoligonucleotides. For example, one or more pairs of longoligonucleotides (e.g., >50-100 nucleotides) can be synthesized thatcontain the desired sequence, with each pair containing a short segmentof complementarity (e.g., about 15 nucleotides) such that a duplex isformed when the oligonucleotide pair is annealed. DNA polymerase is usedto extend the oligonucleotides, resulting in a single, double-strandednucleic acid molecule per oligonucleotide pair, which then can beligated into a vector. Isolated nucleic acids of the invention also canbe obtained by mutagenesis of, e.g., a naturally occurring portion of aCas9-encoding DNA (in accordance with, for example, the formula above).

Two nucleic acids or the polypeptides they encode may be described ashaving a certain degree of identity to one another. For example, a Cas9protein and a biologically active variant thereof may be described asexhibiting a certain degree of identity. Alignments may be assembled bylocating short Cas9 sequences in the Protein Information Research (PIR)site, followed by analysis with the “short nearly identical sequences.”Basic Local Alignment Search Tool (BLAST) algorithm on the NCBI website.

As used herein, the term “percent sequence identity” refers to thedegree of identity between any given query sequence and a subjectsequence. For example, a naturally occurring Cas9 can be the querysequence and a fragment of a Cas9 protein can be the subject sequence.Similarly, a fragment of a Cas9 protein can be the query sequence and abiologically active variant thereof can be the subject sequence.

To determine sequence identity, a query nucleic acid or amino acidsequence can be aligned to one or more subject nucleic acid or aminoacid sequences, respectively, using the computer program ClustalW(version 1.83, default parameters), which allows alignments of nucleicacid or protein sequences to be carried out across their entire length(global alignment). See Chenna et al., Nucleic Acids Res. 31:3497-3500,2003.

ClustalW calculates the best match between a query and one or moresubject sequences and aligns them so that identities, similarities anddifferences can be determined. Gaps of one or more residues can beinserted into a query sequence, a subject sequence, or both, to maximizesequence alignments. For fast pair wise alignment of nucleic acidsequences, the following default parameters are used: word size: 2;window size: 4; scoring method: percentage; number of top diagonals: 4;and gap penalty: 5. for multiple alignments of nucleic acid sequences,the following parameters are used: gap opening penalty: 10.0; gapextension penalty: 5.0; and weight transitions: yes. For fast pair wisealignment of protein sequences, the following parameters are used: wordsize: 1; window size: 5; scoring method: percentage; number of topdiagonals: 5; gap penalty: 3. For multiple alignment of proteinsequences, the following parameters are used: weight matrix: blosum; gapopening penalty: 10.0; gap extension penalty: 0.05; hydrophilic gaps:on; hydrophilic residues: Gly, Pro, Ser, Asn, Asp, Gln, Glu, Arg, andLys; residue-specific gap penalties: on. The output is a sequencealignment that reflects the relationship between sequences. ClustalW canbe run, for example, at the Baylor College of Medicine Search Launchersite (searchlauncher.bcm.tmc.edu/multi-align/multi-align.html) and atthe European Bioinformatics Institute site on the World Wide Web(ebi.ac.uk/clustalw).

To determine a percent identity between a query sequence and a subjectsequence, ClustalW divides the number of identities in the bestalignment by the number of residues compared (gap positions areexcluded), and multiplies the result by 100. The output is the percentidentity of the subject sequence with respect to the query sequence. Itis noted that the percent identity value can be rounded to the nearesttenth. For example, 78.11, 78.12, 78.13, and 78.14 are rounded down to78.1, while 78.15, 78.16, 78.17, 78.18, and 78.19 are rounded up to78.2.

The nucleic acids and polypeptides described herein may be referred toas “exogenous”. The term “exogenous” indicates that the nucleic acid orpolypeptide is part of, or encoded by, a recombinant nucleic acidconstruct, or is not in its natural environment. For example, anexogenous nucleic acid can be a sequence from one species introducedinto another species, i.e., a heterologous nucleic acid. Typically, suchan exogenous nucleic acid is introduced into the other species via arecombinant nucleic acid construct. An exogenous nucleic acid can alsobe a sequence that is native to an organism and that has beenreintroduced into cells of that organism. An exogenous nucleic acid thatincludes a native sequence can often be distinguished from the naturallyoccurring sequence by the presence of non-natural sequences linked tothe exogenous nucleic acid, e.g., non-native regulatory sequencesflanking a native sequence in a recombinant nucleic acid construct. Inaddition, stably transformed exogenous nucleic acids typically areintegrated at positions other than the position where the nativesequence is found.

Recombinant constructs are also provided herein and can be used totransform cells in order to express Cas9 and/or a guide RNAcomplementary to a target sequence in HIV. A recombinant nucleic acidconstruct comprises a nucleic acid encoding a Cas9 and/or a guide RNAcomplementary to a target sequence in HIV as described herein, operablylinked to a regulatory region suitable for expressing the Cas9 and/or aguide RNA complementary to a target sequence in HIV in the cell. It willbe appreciated that a number of nucleic acids can encode a polypeptidehaving a particular amino acid sequence. The degeneracy of the geneticcode is well known in the art. For many amino acids, there is more thanone nucleotide triplet that serves as the codon for the amino acid. Forexample, codons in the coding sequence for Cas9 can be modified suchthat optimal expression in a particular organism is obtained, usingappropriate codon bias tables for that organism.

Vectors containing nucleic acids such as those described herein also areprovided. A “vector” is a replicon, such as a plasmid, phage, or cosmid,into which another DNA segment may be inserted so as to bring about thereplication of the inserted segment. Generally, a vector is capable ofreplication when associated with the proper control elements. Suitablevector backbones include, for example, those routinely used in the artsuch as plasmids, viruses, artificial chromosomes, BACs, YACs, or PACs.The term “vector” includes cloning and expression vectors, as well asviral vectors and integrating vectors. An “expression vector” is avector that includes a regulatory region. A wide variety ofhost/expression vector combinations may be used to express the nucleicacid sequences described herein. Suitable expression vectors include,without limitation, plasmids and viral vectors derived from, forexample, bacteriophage, baculoviruses, and retroviruses. Numerousvectors and expression systems are commercially available from suchcorporations as Novagen (Madison, Wis.), Clontech (Palo Alto, Calif.),Stratagene (La Jolla, Calif.), and Invitrogen/Life Technologies(Carlsbad, Calif.).

The vectors provided herein also can include, for example, origins ofreplication, scaffold attachment regions (SARs), and/or markers. Amarker gene can confer a selectable phenotype on a host cell. Forexample, a marker can confer biocide resistance, such as resistance toan antibiotic (e.g., kanamycin, G418, bleomycin, or hygromycin). Asnoted above, an expression vector can include a tag sequence designed tofacilitate manipulation or detection (e.g., purification orlocalization) of the expressed polypeptide. Tag sequences, such as greenfluorescent protein (GFP), glutathione S-transferase (GST),polyhistidine, c-myc, hemagglutinin, or Flag™ tag (Kodak, New Haven,Conn.) sequences typically are expressed as a fusion with the encodedpolypeptide. Such tags can be inserted anywhere within the polypeptide,including at either the carboxyl or amino terminus.

Additional expression vectors also can include, for example, segments ofchromosomal, non-chromosomal and synthetic DNA sequences. Suitablevectors include derivatives of SV40 and known bacterial plasmids, e.g.,E. coli plasmids col E1, pCR1, pBR322, pMal-C2, pET, pGEX, pMB9 andtheir derivatives, plasmids such as RP4; phage DNAs, e.g., the numerousderivatives of phage 1, e.g., NM989, and other phage DNA, e.g., M13 andfilamentous single stranded phage DNA; yeast plasmids such as the 2μplasmid or derivatives thereof, vectors useful in eukaryotic cells, suchas vectors useful in insect or mammalian cells; vectors derived fromcombinations of plasmids and phage DNAs, such as plasmids that have beenmodified to employ phage DNA or other expression control sequences.

Yeast expression systems can also be used. For example, the non-fusionpYES2 vector (XbaI, SphI, ShoI, NotI, GstXI, EcoRI, BstXI, BamH1, SacI,KpnI, and HindIII cloning sites; Invitrogen) or the fusion pYESHisA, B,C (XbaI, SphI, ShoI, NotI, BstXI, EcoRI, BamH1, SacI, KpnI, and HindIIIcloning sites, N-terminal peptide purified with ProBond resin andcleaved with enterokinase; Invitrogen), to mention just two, can beemployed according to the invention. A yeast two-hybrid expressionsystem can also be prepared in accordance with the invention.

The vector can also include a regulatory region. The term “regulatoryregion” refers to nucleotide sequences that influence transcription ortranslation initiation and rate, and stability and/or mobility of atranscription or translation product. Regulatory regions include,without limitation, promoter sequences, enhancer sequences, responseelements, protein recognition sites, inducible elements, protein bindingsequences, 5′ and 3′ untranslated regions (UTRs), transcriptional startsites, termination sequences, polyadenylation sequences, nuclearlocalization signals, and introns.

As used herein, the term “operably linked” refers to positioning of aregulatory region and a sequence to be transcribed in a nucleic acid soas to influence transcription or translation of such a sequence. Forexample, to bring a coding sequence under the control of a promoter, thetranslation initiation site of the translational reading frame of thepolypeptide is typically positioned between one and about fiftynucleotides downstream of the promoter. A promoter can, however, bepositioned as much as about 5,000 nucleotides upstream of thetranslation initiation site or about 2,000 nucleotides upstream of thetranscription start site. A promoter typically comprises at least a core(basal) promoter. A promoter also may include at least one controlelement, such as an enhancer sequence, an upstream element or anupstream activation region (UAR). The choice of promoters to be includeddepends upon several factors, including, but not limited to, efficiency,selectability, inducibility, desired expression level, and cell- ortissue-preferential expression. It is a routine matter for one of skillin the art to modulate the expression of a coding sequence byappropriately selecting and positioning promoters and other regulatoryregions relative to the coding sequence.

Vectors include, for example, viral vectors (such as adenoviruses(“Ad”), adeno-associated viruses (AAV), and vesicular stomatitis virus(VSV) and retroviruses), liposomes and other lipid-containing complexes,and other macromolecular complexes capable of mediating delivery of apolynucleotide to a host cell. Vectors can also comprise othercomponents or functionalities that further modulate gene delivery and/orgene expression, or that otherwise provide beneficial properties to thetargeted cells. As described and illustrated in more detail below, suchother components include, for example, components that influence bindingor targeting to cells (including components that mediate cell-type ortissue-specific binding); components that influence uptake of the vectornucleic acid by the cell; components that influence localization of thepolynucleotide within the cell after uptake (such as agents mediatingnuclear localization); and components that influence expression of thepolynucleotide. Such components also might include markers, such asdetectable and/or selectable markers that can be used to detect orselect for cells that have taken up and are expressing the nucleic aciddelivered by the vector. Such components can be provided as a naturalfeature of the vector (such as the use of certain viral vectors whichhave components or functionalities mediating binding and uptake), orvectors can be modified to provide such functionalities. Other vectorsinclude those described by Chen et al; BioTechniques, 34: 167-171(2003). A large variety of such vectors are known in the art and aregenerally available.

A “recombinant viral vector” refers to a viral vector comprising one ormore heterologous gene products or sequences. Since many viral vectorsexhibit size-constraints associated with packaging, the heterologousgene products or sequences are typically introduced by replacing one ormore portions of the viral genome. Such viruses may becomereplication-defective, requiring the deleted function(s) to be providedin trans during viral replication and encapsidation (by using, e.g., ahelper virus or a packaging cell line carrying gene products necessaryfor replication and/or encapsidation). Modified viral vectors in which apolynucleotide to be delivered is carried on the outside of the viralparticle have also been described (see, e.g., Curiel, D T, et al. PNAS88: 8850-8854, 1991).

Suitable nucleic acid delivery systems include recombinant viral vector,typically sequence from at least one of an adenovirus,adenovirus-associated virus (AAV), helper-dependent adenovirus,retrovirus, or hemagglutinating virus of Japan-liposome (HVJ) complex.In such cases, the viral vector comprises a strong eukaryotic promoteroperably linked to the polynucleotide e.g., a cytomegalovirus (CMV)promoter. The recombinant viral vector can include one or more of thepolynucleotides therein, preferably about one polynucleotide. In someembodiments, the viral vector used in the invention methods has a pfu(plague forming units) of from about 10⁸ to about 5×10¹⁰ pfu. Inembodiments in which the polynucleotide is to be administered with anon-viral vector, use of between from about 0.1 nanograms to about 4000micrograms will often be useful e.g., about 1 nanogram to about 100micrograms.

Additional vectors include viral vectors, fusion proteins and chemicalconjugates. Retroviral vectors include Moloney murine leukemia virusesand HIV-based viruses. One HIV-based viral vector comprises at least twovectors wherein the gag and pol genes are from an HIV genome and the envgene is from another virus. DNA viral vectors include pox vectors suchas orthopox or avipox vectors, herpesvirus vectors such as a herpessimplex I virus (HSV) vector [Geller, A. I. et al., J. Neurochem, 64:487 (1995); Lim, F., et al., in DNA Cloning: Mammalian Systems, D.Glover, Ed. (Oxford Univ. Press, Oxford England) (1995); Geller, A. I.et al., Proc Natl. Acad. Sci.: U.S.A.: 90 7603 (1993); Geller, A. I., etal., Proc Natl. Acad. Sci USA: 87:1149 (1990)], Adenovirus Vectors[LeGal LaSalle et al., Science, 259:988 (1993); Davidson, et al., Nat.Genet. 3: 219 (1993); Yang, et al., J. Virol. 69: 2004 (1995)] andAdeno-associated Virus Vectors [Kaplitt, M. G., et al., Nat. Genet.8:148 (1994)].

Pox viral vectors introduce the gene into the cells cytoplasm. Avipoxvirus vectors result in only a short term expression of the nucleicacid. Adenovirus vectors, adeno-associated virus vectors and herpessimplex virus (HSV) vectors may be an indication for some inventionembodiments. The adenovirus vector results in a shorter term expression(e.g., less than about a month) than adeno-associated virus, in someembodiments, may exhibit much longer expression. The particular vectorchosen will depend upon the target cell and the condition being treated.The selection of appropriate promoters can readily be accomplished. Anexample of a suitable promoter is the 763-base-pair cytomegalovirus(CMV) promoter. Other suitable promoters which may be used for geneexpression include, but are not limited to, the Rous sarcoma virus (RSV)(Davis, et al., Hum Gene Ther 4:151 (1993)), the SV40 early promoterregion, the herpes thymidine kinase promoter, the regulatory sequencesof the metallothionein (MMT) gene, prokaryotic expression vectors suchas the β-lactamase promoter, the tac promoter, promoter elements fromyeast or other fungi such as the Gal 4 promoter, the ADC (alcoholdehydrogenase) promoter, PGK (phosphoglycerol kinase) promoter, alkalinephosphatase promoter; and the animal transcriptional control regions,which exhibit tissue specificity and have been utilized in transgenicanimals: elastase I gene control region which is active in pancreaticacinar cells, insulin gene control region which is active in pancreaticbeta cells, immunoglobulin gene control region which is active inlymphoid cells, mouse mammary tumor virus control region which is activein testicular, breast, lymphoid and mast cells, albumin gene controlregion which is active in liver, alpha-fetoprotein gene control regionwhich is active in liver, alpha 1-antitrypsin gene control region whichis active in the liver, beta-globin gene control region which is activein myeloid cells, myelin basic protein gene control region which isactive in oligodendrocyte cells in the brain, myosin light chain-2 genecontrol region which is active in skeletal muscle, and gonadotropicreleasing hormone gene control region which is active in thehypothalamus. Certain proteins can expressed using their nativepromoter. Other elements that can enhance expression can also beincluded such as an enhancer or a system that results in high levels ofexpression such as a tat gene and tar element. This cassette can then beinserted into a vector, e.g., a plasmid vector such as, pUC19, pUC118,pBR322, or other known plasmid vectors, that includes, for example, anE. coli origin of replication. See, Sambrook, et al., Molecular Cloning:A Laboratory Manual, Cold Spring Harbor Laboratory press, (1989). Theplasmid vector may also include a selectable marker such as theβ-lactamase gene for ampicillin resistance, provided that the markerpolypeptide does not adversely affect the metabolism of the organismbeing treated. The cassette can also be bound to a nucleic acid bindingmoiety in a synthetic delivery system, such as the system disclosed inWO 95/22618.

If desired, the polynucleotides of the invention may also be used with amicrodelivery vehicle such as cationic liposomes and adenoviral vectors.For a review of the procedures for liposome preparation, targeting anddelivery of contents, see Mannino and Gould-Fogerite, BioTechniques,6:682 (1988). See also, Feigner and Holm, Bethesda Res. Lab. Focus,11(2):21 (1989) and Maurer, R. A., Bethesda Res. Lab. Focus, 11(2):25(1989).

Replication-defective recombinant adenoviral vectors, can be produced inaccordance with known techniques. See, Quantin, et al., Proc. Natl.Acad. Sci. USA, 89:2581-2584 (1992); Stratford-Perricadet, et al., J.Clin. Invest., 90:626-630 (1992); and Rosenfeld, et al., Cell,68:143-155 (1992).

Another delivery method is to use single stranded DNA producing vectorswhich can produce the expressed products intracellularly. See forexample, Chen et al, BioTechniques, 34: 167-171 (2003), which isincorporated herein, by reference, in its entirety.

Pharmaceutical Compositions

As described above, the compositions of the present invention can beprepared in a variety of ways known to one of ordinary skill in the art.Regardless of their original source or the manner in which they areobtained, the compositions of the invention can be formulated inaccordance with their use. For example, the nucleic acids and vectorsdescribed above can be formulated within compositions for application tocells in tissue culture or for administration to a patient or subject.Any of the pharmaceutical compositions of the invention can beformulated for use in the preparation of a medicament, and particularuses are indicated below in the context of treatment, e.g., thetreatment of a subject having an HIV infection or at risk forcontracting and HIV infection. When employed as pharmaceuticals, any ofthe nucleic acids and vectors can be administered in the form ofpharmaceutical compositions. These compositions can be prepared in amanner well known in the pharmaceutical art, and can be administered bya variety of routes, depending upon whether local or systemic treatmentis desired and upon the area to be treated. Administration may betopical (including ophthalmic and to mucous membranes includingintranasal, vaginal and rectal delivery), pulmonary (e.g., by inhalationor insufflation of powders or aerosols, including by nebulizer;intratracheal, intranasal, epidermal and transdermal), ocular, oral orparenteral. Methods for ocular delivery can include topicaladministration (eye drops), subconjunctival, periocular or intravitrealinjection or introduction by balloon catheter or ophthalmic insertssurgically placed in the conjunctival sac. Parenteral administrationincludes intravenous, intra-arterial, subcutaneous, intraperitoneal orintramuscular injection or infusion; or intracranial, e.g., intrathecalor intraventricular administration. Parenteral administration can be inthe form of a single bolus dose, or may be, for example, by a continuousperfusion pump. Pharmaceutical compositions and formulations for topicaladministration may include transdermal patches, ointments, lotions,creams, gels, drops, suppositories, sprays, liquids, powders, and thelike. Conventional pharmaceutical carriers, aqueous, powder or oilybases, thickeners and the like may be necessary or desirable.

This invention also includes pharmaceutical compositions which contain,as the active ingredient, nucleic acids and vectors described herein incombination with one or more pharmaceutically acceptable carriers. Weuse the terms “pharmaceutically acceptable” (or “pharmacologicallyacceptable”) to refer to molecular entities and compositions that do notproduce an adverse, allergic or other untoward reaction whenadministered to an animal or a human, as appropriate. The term“pharmaceutically acceptable carrier,” as used herein, includes any andall solvents, dispersion media, coatings, antibacterial, isotonic andabsorption delaying agents, buffers, excipients, binders, lubricants,gels, surfactants and the like, that may be used as media for apharmaceutically acceptable substance. In making the compositions of theinvention, the active ingredient is typically mixed with an excipient,diluted by an excipient or enclosed within such a carrier in the formof, for example, a capsule, tablet, sachet, paper, or other container.When the excipient serves as a diluent, it can be a solid, semisolid, orliquid material (e.g., normal saline), which acts as a vehicle, carrieror medium for the active ingredient. Thus, the compositions can be inthe form of tablets, pills, powders, lozenges, sachets, cachets,elixirs, suspensions, emulsions, solutions, syrups, aerosols (as a solidor in a liquid medium), lotions, creams, ointments, gels, soft and hardgelatin capsules, suppositories, sterile injectable solutions, andsterile packaged powders. As is known in the art, the type of diluentcan vary depending upon the intended route of administration. Theresulting compositions can include additional agents, such aspreservatives. In some embodiments, the carrier can be, or can include,a lipid-based or polymer-based colloid. In some embodiments, the carriermaterial can be a colloid formulated as a liposome, a hydrogel, amicroparticle, a nanoparticle, or a block copolymer micelle. As noted,the carrier material can form a capsule, and that material may be apolymer-based colloid.

The nucleic acid sequences of the invention can be delivered to anappropriate cell of a subject. This can be achieved by, for example, theuse of a polymeric, biodegradable microparticle or microcapsule deliveryvehicle, sized to optimize phagocytosis by phagocytic cells such asmacrophages. For example, PLGA (poly-lacto-co-glycolide) microparticlesapproximately 1-10 μm in diameter can be used. The polynucleotide isencapsulated in these microparticles, which are taken up by macrophagesand gradually biodegraded within the cell, thereby releasing thepolynucleotide. Once released, the DNA is expressed within the cell. Asecond type of microparticle is intended not to be taken up directly bycells, but rather to serve primarily as a slow-release reservoir ofnucleic acid that is taken up by cells only upon release from themicro-particle through biodegradation. These polymeric particles shouldtherefore be large enough to preclude phagocytosis (i.e., larger than 5μm and preferably larger than 20 μm). Another way to achieve uptake ofthe nucleic acid is using liposomes, prepared by standard methods. Thenucleic acids can be incorporated alone into these delivery vehicles orco-incorporated with tissue-specific antibodies, for example antibodiesthat target cell types that are commonly latently infected reservoirs ofHIV infection, for example, brain macrophages, microglia, astrocytes,and gut-associated lymphoid cells. Alternatively, one can prepare amolecular complex composed of a plasmid or other vector attached topoly-L-lysine by electrostatic or covalent forces. Poly-L-lysine bindsto a ligand that can bind to a receptor on target cells. Delivery of“naked DNA” (i.e., without a delivery vehicle) to an intramuscular,intradermal, or subcutaneous site, is another means to achieve in vivoexpression. In the relevant polynucleotides (e.g., expression vectors)the nucleic acid sequence encoding the an isolated nucleic acid sequencecomprising a sequence encoding a CRISPR-associated endonuclease and aguide RNA is operatively linked to a promoter or enhancer-promotercombination. Promoters and enhancers are described above.

In some embodiments, the compositions of the invention can be formulatedas a nanoparticle, for example, nanoparticles comprised of a core ofhigh molecular weight linear polyethylenimine (LPEI) complexed with DNAand surrounded by a shell of polyethyleneglycol-modified (PEGylated) lowmolecular weight LPEI.

The nucleic acids and vectors may also be applied to a surface of adevice (e.g., a catheter) or contained within a pump, patch, or otherdrug delivery device. The nucleic acids and vectors of the invention canbe administered alone, or in a mixture, in the presence of apharmaceutically acceptable excipient or carrier (e.g., physiologicalsaline). The excipient or carrier is selected on the basis of the modeand route of administration. Suitable pharmaceutical carriers, as wellas pharmaceutical necessities for use in pharmaceutical formulations,are described in Remington's Pharmaceutical Sciences (E. W. Martin), awell-known reference text in this field, and in the USP/NF (UnitedStates Pharmacopeia and the National Formulary).

In some embodiments, the compositions may be formulated as a topical gelfor blocking sexual transmission of HIV. The topical gel can be applieddirectly to the skin or mucous membranes of the male or female genitalregion prior to sexual activity. Alternatively or in addition thetopical gel can be applied to the surface or contained within a male orfemale condom or diaphragm.

In some embodiments, the compositions can be formulated as ananoparticle encapsulating a nucleic acid encoding Cas9 or a variantCas9 and a guide RNA sequence complementary to a target HIV or vectorcomprising a nucleic acid encoding Cas9 and a guide RNA sequencecomplementary to a target HIV. Alternatively, the compositions can beformulated as a nanoparticle encapsulating a CRISPR-associatedendonuclease polypeptide, e.g., Cas9 or a variant Cas9 and a guide RNAsequence complementary to a target.

The present formulations can encompass a vector encoding Cas9 and aguide RNA sequence complementary to a target HIV. The guide RNA sequencecan include a sequence complementary to a single region, e.g. LTR A, B,C, or D or it can include any combination of sequences complementary toLTR A, B, C, and D. Alternatively the sequence encoding Cas9 and thesequence encoding the guide RNA sequence can be on separate vectors.

Methods of Treatment

The compositions disclosed herein are generally and variously useful fortreatment of a subject having a retroviral infection, e.g., an HIVinfection. We may refer to a subject, patient, or individualinterchangeably. The methods are useful for targeting any HIV, forexample, HIV-1, HIV-2, and any circulating recombinant form thereof. Asubject is effectively treated whenever a clinically beneficial resultensues. This may mean, for example, a complete resolution of thesymptoms of a disease, a decrease in the severity of the symptoms of thedisease, or a slowing of the disease's progression. These methods canfurther include the steps of a) identifying a subject (e.g., a patientand, more specifically, a human patient) who has an HIV infection; andb) providing to the subject a composition comprising a nucleic acidencoding a CRISPR-associated nuclease, e.g., Cas9, and a guide RNAcomplementary to an HIV target sequence, e.g. an HIV LTR. A subject canbe identified using standard clinical tests, for example, immunoassaysto detect the presence of HIV antibodies or the HIV polypeptide p24 inthe subject's serum, or through HIV nucleic acid amplification assays.An amount of such a composition provided to the subject that results ina complete resolution of the symptoms of the infection, a decrease inthe severity of the symptoms of the infection, or a slowing of theinfection's progression is considered a therapeutically effectiveamount. The present methods may also include a monitoring step to helpoptimize dosing and scheduling as well as predict outcome. In somemethods of the present invention, one can first determine whether apatient has a latent HIV-1 infection, and then make a determination asto whether or not to treat the patient with one or more of thecompositions described herein. Monitoring can also be used to detect theonset of drug resistance and to rapidly distinguish responsive patientsfrom nonresponsive patients. In some embodiments, the methods canfurther include the step of determining the nucleic acid sequence of theparticular HIV harbored by the patient and then designing the guide RNAto be complementary to those particular sequences. For example, one candetermine the nucleic acid sequence of a subject's LTR U3, R or U5region and then design one or more guide RNAs to be preciselycomplementary to the patient's sequences.

The compositions are also useful for the treatment, for example, as aprophylactic treatment, of a subject at risk for having a retroviralinfection, e.g., an HIV infection. These methods can further include thesteps of a) identifying a subject at risk for having an HIV infection;b) providing to the subject a composition comprising a nucleic acidencoding a CRISPR-associated nuclease, e.g., Cas9, and a guide RNAcomplementary to an HIV target sequence, e.g. an HIV LTR. A subject atrisk for having an HIV infection can be, for example, any sexuallyactive individual engaging in unprotected sex, i.e., engaging in sexualactivity without the use of a condom; a sexually active individualhaving another sexually transmitted infection; an intravenous drug user;or an uncircumcised man. A subject at risk for having an HIV infectioncan be, for example, an individual whose occupation may bring him or herinto contact with HIV-infected populations, e.g., healthcare workers orfirst responders. A subject at risk for having an HIV infection can be,for example, an inmate in a correctional setting or a sex worker, thatis, an individual who uses sexual activity for income employment ornonmonetary items such as food, drugs, or shelter.

The compositions can also be administered to a pregnant or lactatingwoman having an HIV infection in order to reduce the likelihood oftransmission of HIV from the mother to her offspring. A pregnant womaninfected with HIV can pass the virus to her offspring transplacentallyin utero, at the time of delivery through the birth canal or followingdelivery, through breast milk. The compositions disclosed herein can beadministered to the HIV infected mother either prenatally, perinatallyor postnatally during the breast-feeding period, or any combination ofprenatal, perinatal, and postnatal administration. Compositions can beadministered to the mother along with standard antiretroviral therapiesas described below. In some embodiments, the compositions of theinvention are also administered to the infant immediately followingdelivery and, in some embodiments, at intervals thereafter. The infantalso can receive standard antiretroviral therapy.

The methods and compositions disclosed herein are useful for thetreatment of retroviral infections. Exemplary retroviruses include humanimmunodeficiency viruses, e.g. HIV-1, HIV-2; simian immunodeficiencyvirus (SIV); feline immunodeficiency virus (FIV); bovineimmunodeficiency virus (BIV); equine infectious anemia virus (EIAV); andcaprine arthritis/encephalitis virus (CAEV). The methods disclosedherein can be applied to a wide range of species, e.g., humans,non-human primates (e.g., monkeys), horses or other livestock, dogs,cats, ferrets or other mammals kept as pets, rats, mice, or otherlaboratory animals.

The methods of the invention can be expressed in terms of thepreparation of a medicament. Accordingly, the invention encompasses theuse of the agents and compositions described herein in the preparationof a medicament. The compounds described herein are useful intherapeutic compositions and regimens or for the manufacture of amedicament for use in treatment of diseases or conditions as describedherein.

Any composition described herein can be administered to any part of thehost's body for subsequent delivery to a target cell. A composition canbe delivered to, without limitation, the brain, the cerebrospinal fluid,joints, nasal mucosa, blood, lungs, intestines, muscle tissues, skin, orthe peritoneal cavity of a mammal. In terms of routes of delivery, acomposition can be administered by intravenous, intracranial,intraperitoneal, intramuscular, subcutaneous, intramuscular,intrarectal, intravaginal, intrathecal, intratracheal, intradermal, ortransdermal injection, by oral or nasal administration, or by gradualperfusion over time. In a further example, an aerosol preparation of acomposition can be given to a host by inhalation.

The dosage required will depend on the route of administration, thenature of the formulation, the nature of the patient's illness, thepatient's size, weight, surface area, age, and sex, other drugs beingadministered, and the judgment of the attending clinicians. Widevariations in the needed dosage are to be expected in view of thevariety of cellular targets and the differing efficiencies of variousroutes of administration. Variations in these dosage levels can beadjusted using standard empirical routines for optimization, as is wellunderstood in the art. Administrations can be single or multiple (e.g.,2- or 3-, 4-, 6-, 8-, 10-, 20-, 50-, 100-, 150-, or more fold).Encapsulation of the compounds in a suitable delivery vehicle (e.g.,polymeric microparticles or implantable devices) may increase theefficiency of delivery.

The duration of treatment with any composition provided herein can beany length of time from as short as one day to as long as the life spanof the host (e.g., many years). For example, a compound can beadministered once a week (for, for example, 4 weeks to many months oryears); once a month (for, for example, three to twelve months or formany years); or once a year for a period of 5 years, ten years, orlonger. It is also noted that the frequency of treatment can bevariable. For example, the present compounds can be administered once(or twice, three times, etc.) daily, weekly, monthly, or yearly.

An effective amount of any composition provided herein can beadministered to an individual in need of treatment. The term “effective”as used herein refers to any amount that induces a desired responsewhile not inducing significant toxicity in the patient. Such an amountcan be determined by assessing a patient's response after administrationof a known amount of a particular composition. In addition, the level oftoxicity, if any, can be determined by assessing a patient's clinicalsymptoms before and after administering a known amount of a particularcomposition. It is noted that the effective amount of a particularcomposition administered to a patient can be adjusted according to adesired outcome as well as the patient's response and level of toxicity.Significant toxicity can vary for each particular patient and depends onmultiple factors including, without limitation, the patient's diseasestate, age, and tolerance to side effects.

Any method known to those in the art can be used to determine if aparticular response is induced. Clinical methods that can assess thedegree of a particular disease state can be used to determine if aresponse is induced. The particular methods used to evaluate a responsewill depend upon the nature of the patient's disorder, the patient'sage, and sex, other drugs being administered, and the judgment of theattending clinician.

The compositions may also be administered with another therapeuticagent, for example, an anti-retroviral agent, used in HAART. Exemplaryantiretroviral agents include reverse transcriptase inhibitors (e.g.,nucleoside/nucleotide reverse transcriptase inhibitors, zidovudine,emtricitibine, lamivudine and tenofivir; and non-nucleoside reversetranscriptase inhibitors such as efavarenz, nevirapine, rilpivirine);protease inhibitors, e.g., tipiravir, darunavir, indinavir; entryinhibitors, e.g., maraviroc; fusion inhibitors, e.g., enfuviritide; orintegrase inhibitors e.g., raltegrivir, dolutegravir. Exemplaryantiretroviral agents can also include multi-class combination agentsfor example, combinations of emtricitabine, efavarenz, and tenofivir;combinations of emtricitabine; rilpivirine, and tenofivir; orcombinations of elvitegravir, cobicistat, emtricitabine and tenofivir.

Concurrent administration of two or more therapeutic agents does notrequire that the agents be administered at the same time or by the sameroute, as long as there is an overlap in the time period during whichthe agents are exerting their therapeutic effect. Simultaneous orsequential administration is contemplated, as is administration ondifferent days or weeks. The therapeutic agents may be administeredunder a metronomic regimen, e.g., continuous low-doses of a therapeuticagent.

Dosage, toxicity and therapeutic efficacy of such compositions can bedetermined by standard pharmaceutical procedures in cell cultures orexperimental animals, e.g., for determining the LD₅₀ (the dose lethal to50% of the population) and the ED₅₀ (the dose therapeutically effectivein 50% of the population). The dose ratio between toxic and therapeuticeffects is the therapeutic index and it can be expressed as the ratioLD₅₀/ED₅₀.

The data obtained from the cell culture assays and animal studies can beused in formulating a range of dosage for use in humans. The dosage ofsuch compositions lies preferably within a range of circulatingconcentrations that include the ED₅₀ with little or no toxicity. Thedosage may vary within this range depending upon the dosage formemployed and the route of administration utilized. For any compositionused in the method of the invention, the therapeutically effective dosecan be estimated initially from cell culture assays. A dose may beformulated in animal models to achieve a circulating plasmaconcentration range that includes the IC₅₀ (i.e., the concentration ofthe test compound which achieves a half-maximal inhibition of symptoms)as determined in cell culture. Such information can be used to moreaccurately determine useful doses in humans. Levels in plasma may bemeasured, for example, by high performance liquid chromatography.

As described, a therapeutically effective amount of a composition (i.e.,an effective dosage) means an amount sufficient to produce atherapeutically (e.g., clinically) desirable result. The compositionscan be administered one from one or more times per day to one or moretimes per week; including once every other day. The skilled artisan willappreciate that certain factors can influence the dosage and timingrequired to effectively treat a subject, including but not limited tothe severity of the disease or disorder, previous treatments, thegeneral health and/or age of the subject, and other diseases present.Moreover, treatment of a subject with a therapeutically effective amountof the compositions of the invention can include a single treatment or aseries of treatments.

The compositions described herein are suitable for use in a variety ofdrug delivery systems described above. Additionally, in order to enhancethe in vivo serum half-life of the administered compound, thecompositions may be encapsulated, introduced into the lumen ofliposomes, prepared as a colloid, or other conventional techniques maybe employed which provide an extended serum half-life of thecompositions. A variety of methods are available for preparingliposomes, as described in, e.g., Szoka, et al., U.S. Pat. Nos.4,235,871, 4,501,728 and 4,837,028 each of which is incorporated hereinby reference. Furthermore, one may administer the drug in a targeteddrug delivery system, for example, in a liposome coated with atissue-specific antibody. The liposomes will be targeted to and taken upselectively by the organ.

Also provided, are methods of inactivating a retrovirus, for example alentivirus such as a human immunodeficiency virus, a simianimmunodeficiency virus, a feline immunodeficiency virus, or a bovineimmunodeficiency virus in a mammalian cell. The human immunodeficiencyvirus can be HIV-1 or HIV-2. The human immunodeficiency virus can be achromosomally integrated provirus. The mammalian cell can be any celltype infected by HIV, including, but not limited to CD4+ lymphocytes,macrophages, fibroblasts, monocytes, T lymphocytes, B lymphocytes,natural killer cells, dendritic cells such as Langerhans cells andfollicular dendritic cells, hematopoietic stem cells, endothelial cells,brain microglial cells, and gastrointestinal epithelial cells. Such celltypes include those cell types that are typically infected during aprimary infection, for example, a CD4+ lymphocyte, a macrophage, or aLangerhans cell, as well as those cell types that make up latent HIVreservoirs, i.e., a latently infected cell.

The methods can include exposing the cell to a composition comprising anisolated nucleic acid encoding a gene editing complex comprising aCRISPR-associated endonuclease and one or more guide RNAs wherein theguide RNA is complementary to a target nucleic acid sequence in theretrovirus. In a preferred embodiment, as previously described, themethod of inactivating a proviral DNA integrated into the genome of ahost cell latently infected with a retrovirus includes the steps oftreating the host cell with a composition comprising a CRISPR-associatedendonuclease, and two or more different guide RNAs (gRNAs), wherein eachof the at least two gRNAs is complementary to a different target nucleicacid sequence in the proviral DNA; and inactivating the proviral DNA.The at least two gRNAs can be configured as a single sequence or as acombination of one or more different sequences, e.g., a multiplexconfiguration. Multiplex configurations can include combinations of two,three, four, five, six, seven, eight, nine, ten, or more differentgRNAs, for example any combination of sequences in U3, R, or U5. In someembodiments, combinations of LTR A, LTR B, LTR C and LTR D can be used.In some embodiments, combinations of any of the sequences LTR A (SEQ IDNO: 96), LTR B (SEQ ID NO: 121), LTR C (SEQ ID NO: 87), and LTR D (SEQID NO: 110), can be used. In experiments described in the Examples, theuse of two different gRNAs caused the excision of the viral sequencesbetween the cleavage sites recognized by the CRISPR endonuclease. Theexcised region can include the entire HIV-1 genome. The treating stepcan take place in vivo, that is, the compositions can be administereddirectly to a subject having HIV infection. The methods are not solimited however, and the treating step can take place ex vivo. Forexample, a cell or plurality of cells, or a tissue explant, can beremoved from a subject having an HIV infection and placed in culture,and then treated with a composition comprising a CRISPR-associatedendonuclease and a guide RNA wherein the guide RNA is complementary tothe nucleic acid sequence in the human immunodeficiency virus. Asdescribed above, the composition can be a nucleic acid encoding aCRISPR-associated endonuclease and a guide RNA wherein the guide RNA iscomplementary to the nucleic acid sequence in the human immunodeficiencyvirus; an expression vector comprising the nucleic acid sequence; or apharmaceutical composition comprising a nucleic acid encoding aCRISPR-associated endonuclease and a guide RNA wherein the guide RNA iscomplementary to the nucleic acid sequence in the human immunodeficiencyvirus; or an expression vector comprising the nucleic acid sequence. Insome embodiments, the gene editing complex can comprise aCRISPR-associated endonuclease polypeptide and a guide RNA wherein theguide RNA is complementary to the nucleic acid sequence in the humanimmunodeficiency virus.

Regardless of whether compositions are administered as nucleic acids orpolypeptides, they are formulated in such a way as to promote uptake bythe mammalian cell. Useful vector systems and formulations are describedabove. In some embodiments the vector can deliver the compositions to aspecific cell type. The invention is not so limited however, and othermethods of DNA delivery such as chemical transfection, using, forexample calcium phosphate, DEAE dextran, liposomes, lipoplexes,surfactants, and perfluoro chemical liquids are also contemplated, asare physical delivery methods, such as electroporation, micro injection,ballistic particles, and “gene gun” systems.

Standard methods, for example, immunoassays to detect theCRISPR-associated endonuclease, or nucleic acid-based assays such as PCRto detect the gRNA, can be used to confirm that the complex has beentaken up and expressed by the cell into which it has been introduced.The engineered cells can then be reintroduced into the subject from whomthey were derived as described below.

The gene editing complex comprises a CRISPR-associated nuclease, e.g.,Cas9, and a guide RNA complementary to the retroviral target sequence,for example, an HIV target sequence. The gene editing complex canintroduce various mutations into the proviral DNA. The mechanism bywhich such mutations inactivate the virus can vary, for example themutation can affect proviral replication, viral gene expression orproviral excision. The mutations may be located in regulatory sequencesor structural gene sequences and result in defective production of HIV.The mutation can comprise a deletion. The size of the deletion can varyfrom a single nucleotide base pair to about 10,000 base pairs. In someembodiments, the deletion can include all or substantially all of theproviral sequence. In some embodiments the deletion can include theentire proviral sequence. The mutation can comprise an insertion; thatis the addition of one or more nucleotide base pairs to the pro-viralsequence. The size of the inserted sequence also may vary, for examplefrom about one base pair to about 300 nucleotide base pairs. Themutation can comprise a point mutation, that is, the replacement of asingle nucleotide with another nucleotide. Useful point mutations arethose that have functional consequences, for example, mutations thatresult in the conversion of an amino acid codon into a termination codonor that result in the production of a nonfunctional protein.

In exemplary multiplex methods for inactivating proviral DNA integratedinto the genome of a host cell, as demonstrated in Examples 2-5, twodifferent gRNA sequences are deployed, with each gRNA sequence targetinga different site in the proviral DNA. That is, the methods include thesteps of exposing the host cell to a composition including an isolatednucleic acid encoding a CRISPR-associated endonuclease; an isolatednucleic acid sequence encoding a first gRNA having a first spacersequence that is complementary to a first target protospacer sequence ina proviral DNA; and an isolated nucleic acid encoding a second gRNAhaving a second spacer sequence that is complementary to a second targetprotospacer sequence in the proviral DNA; expressing in the host cellthe CRISPR-associated endonuclease, the first gRNA, and the second gRNA;assembling, in the host cell, a first gene editing complex including theCRISPR-associated endonuclease and the first gRNA; and a second geneediting complex including the CRISPR-associated endonuclease and thesecond gRNA; directing the first gene editing complex to the firsttarget protospacer sequence by complementary base pairing between thefirst spacer sequence and the first target protospacer sequence;directing the second gene editing complex to the second targetprotospacer sequence by complementary base pairing between the secondspacer sequence and the second target protospacer sequence; cleaving theproviral DNA at the first target protospacer sequence with theCRISPR-associated endonuclease; cleaving the proviral DNA at the secondtarget protospacer sequence with the CRISPR-associated endonuclease; andinducing at least one mutation in the proviral DNA. The same multiplexmethod is readily incorporated into methods for treating a subjecthaving a human immunodeficiency virus, and for reducing the risk of ahuman immunodeficiency virus infection. It will be understood that theterm “composition” can include not only a mixture of components, butalso separate components that are not necessarily administeredsimultaneously. As a non-limiting example, a composition according tothe present invention can include separate component preparations ofnucleic acid sequences encoding a Cas9 nuclease, a first gRNA, and asecond gRNA, with each component being administered sequentially in aninfusion, during a time frame that results in a host cell being exposedto all three components.

In other embodiments, the compositions comprise a cell which has beentransformed or transfected with one or more Cas/gRNA vectors. In someembodiments, the methods of the invention can be applied ex vivo. Thatis, a subject's cells can be removed from the body and treated with thecompositions in culture to excise HIV sequences and the treated cellsreturned to the subject's body. The cell can be the subject's cells orthey can be haplotype matched or a cell line. The cells can beirradiated to prevent replication. In some embodiments, the cells arehuman leukocyte antigen (HLA)-matched, autologous, cell lines, orcombinations thereof. In other embodiments the cells can be a stem cell.For example, an embryonic stem cell or an artificial pluripotent stemcell (induced pluripotent stem cell (iPS cell)). Embryonic stem cells(ES cells) and artificial pluripotent stem cells (induced pluripotentstem cell, iPS cells) have been established from many animal species,including humans. These types of pluripotent stem cells would be themost useful source of cells for regenerative medicine because thesecells are capable of differentiation into almost all of the organs byappropriate induction of their differentiation, with retaining theirability of actively dividing while maintaining their pluripotency. iPScells, in particular, can be established from self-derived somaticcells, and therefore are not likely to cause ethical and social issues,in comparison with ES cells which are produced by destruction ofembryos. Further, iPS cells, which are self-derived cell, make itpossible to avoid rejection reactions, which are the biggest obstacle toregenerative medicine or transplantation therapy.

The gRNA expression cassette can be easily delivered to a subject bymethods known in the art, for example, methods which deliver siRNA. Insome aspects, the Cas may be a fragment wherein the active domains ofthe Cas molecule are included, thereby cutting down on the size of themolecule. Thus, the, Cas9/gRNA molecules can be used clinically, similarto the approaches taken by current gene therapy. In particular, aCas9/multiplex gRNA stable expression stem cell or iPS cells for celltransplantation therapy as well as HIV-1 vaccination will be developedfor use in subjects.

Transduced cells are prepared for reinfusion according to establishedmethods. After a period of about 2-4 weeks in culture, the cells maynumber between 1×10⁶ and 1×10¹⁰. In this regard, the growthcharacteristics of cells vary from patient to patient and from cell typeto cell type. About 72 hours prior to reinfusion of the transducedcells, an aliquot is taken for analysis of phenotype, and percentage ofcells expressing the therapeutic agent. For administration, cells of thepresent invention can be administered at a rate determined by the LD₅₀of the cell type, and the side effects of the cell type at variousconcentrations, as applied to the mass and overall health of thepatient. Administration can be accomplished via single or divided doses.Adult stem cells may also be mobilized using exogenously administeredfactors that stimulate their production and egress from tissues orspaces that may include, but are not restricted to, bone marrow oradipose tissues.

Articles of Manufacture

The compositions described herein can be packaged in suitable containerslabeled, for example, for use as a therapy to treat a subject having aretroviral infection, for example, an HIV infection or a subject at forcontracting a retroviral infection, for example, an HIV infection. Thecontainers can include a composition comprising a nucleic acid sequenceencoding a CRISPR-associated endonuclease, for example, a Cas9endonuclease, and a guide RNA complementary to a target sequence in ahuman immunodeficiency virus, or a vector encoding that nucleic acid,and one or more of a suitable stabilizer, carrier molecule, flavoring,and/or the like, as appropriate for the intended use. Accordingly,packaged products (e.g., sterile containers containing one or more ofthe compositions described herein and packaged for storage, shipment, orsale at concentrated or ready-to-use concentrations) and kits, includingat least one composition of the invention, e.g., a nucleic acid sequenceencoding a CRISPR-associated endonuclease, for example, a Cas9endonuclease, and a guide RNA complementary to a target sequence in ahuman immunodeficiency virus, or a vector encoding that nucleic acid andinstructions for use, are also within the scope of the invention. Aproduct can include a container (e.g., a vial, jar, bottle, bag, or thelike) containing one or more compositions of the invention. In addition,an article of manufacture further may include, for example, packagingmaterials, instructions for use, syringes, delivery devices, buffers orother control reagents for treating or monitoring the condition forwhich prophylaxis or treatment is required.

In some embodiments, the kits can include one or more additionalantiretroviral agents, for example, a reverse transcriptase inhibitor, aprotease inhibitor or an entry inhibitor. The additional agents can bepackaged together in the same container as a nucleic acid sequenceencoding a CRISPR-associated endonuclease, for example, a Cas9endonuclease, and a guide RNA complementary to a target sequence in ahuman immunodeficiency virus, or a vector encoding that nucleic acid orthey can be packaged separately. The nucleic acid sequence encoding aCRISPR-associated endonuclease, for example, a Cas9 endonuclease, and aguide RNA complementary to a target sequence in a human immunodeficiencyvirus, or a vector encoding that nucleic acid and the additional agentmay be combined just before use or administered separately.

The product may also include a legend (e.g., a printed label or insertor other medium describing the product's use (e.g., an audio- orvideotape)). The legend can be associated with the container (e.g.,affixed to the container) and can describe the manner in which thecompositions therein should be administered (e.g., the frequency androute of administration), indications therefor, and other uses. Thecompositions can be ready for administration (e.g., present indose-appropriate units), and may include one or more additionalpharmaceutically acceptable adjuvants, carriers or other diluents and/oran additional therapeutic agent. Alternatively, the compositions can beprovided in a concentrated form with a diluent and instructions fordilution.

Example 1: Materials and Methods

Plasmid Preparation:

Vectors containing human Cas9 and gRNA expression cassette, pX260, andpX330 (Addgene) were utilized to create various constructs, LTR-A, B, C,and D.

Cell Culture and Stable Cell Lines:

TZM-b1 reporter and U1 cell lines were obtained from the NIH AIDSReagent Program and CHME5 microglial cells are known in the art.

Immunohistochemistry and Western Blot:

Standard methods for immunocytochemical observation of the cells andevaluation of protein expression by Western blot were utilized.

Firefly-Luciferase Assay:

Cells were lysed 24 h post-treatment using Passive Lysis Buffer(Promega) and assayed with a Luciferase Reporter Gene Assay kit(Promega) according to the manufacturer's protocol. Luciferase activitywas normalized to the number of cells determined by a parallel MTT assay(Vybrant, Invitrogen)

p24 ELISA:

After infection or reactivation, the levels of HIV-1 viral load in thesupernatants were quantified by p24 Gag ELISA (Advanced BioScienceLaboratories, Inc) following the manufacturer's protocol. To assess cellviability upon treatments, MTT assay was performed in parallel accordingto the manufacturer's manual (Vybrant, Invitrogen).

EGFP Flow Cytometry:

Cells were trypsinized, washed with PBS and fixed in 2% paraformaldehydefor 10 min at room temperature, then washed twice with PBS and analyzedusing a Guava EasyCyte Mini flow cytometer (Guava Technologies).

HIV-1 Reporter Virus Preparation and Infections:

HEK293T cells were transfected using Lipofectamine 2000 reagent(Invitrogen) with pNL4-3-ΔE-EGFP (NIH AIDS Research and ReferenceReagent Program). After 48 h, the supernatant was collected, 0.45 μmfiltered and tittered in HeLa cells using EGFP as an infection marker.For viral infection, stable Cas9/gRNA TZM-bl cells were incubated 2 hwith diluted viral stock, and then washed twice with PBS. At 2 and 4 dpost-infection, cells were collected, fixed and analyzed by flowcytometry for EGFP expression, or genomic DNA purification was performedfor PCR and whole genome sequencing.

Genomic DNA Amplification, PCR, TA-Cloning, and Sanger Sequencing,GenomeWalker Link PCR:

Standard methods for DNA manipulation for cloning and sequencing wereutilized. For identification of the integration sites of HIV-1, weutilized Lenti-X™ integration site analysis kit was used.

Surveyor Assay:

The presence of mutations in PCR products was examined using a SURVEYORMutation Detection Kit (Transgenomic) according to the protocol from themanufacturer. Briefly heterogeneous PCR product was denatured for 10 minin 95° C. and hybridized by gradual cooling using a thermocycler. Next,300 ng of hybridized DNA (9 μl) was subjected to digestion with 0.25 μlof SURVEYOR Nuclease in the presence of 0.25 μl SURVEYOR Enhancer S and15 mM MgCl₂ for 4 h at 42° C. Then Stop Solution was added and sampleswere resolved in 2% agarose gel together with equal amounts ofundigested PCR product controls.

Some PCR products were used for restriction fragment length polymorphismanalysis. Equal amounts of the PCR products were digested with BsaJI.Digested DNA was separated on an ethidium bromide-contained agarose gel(2%). For sequencing, PCR products were cloned using a TA Cloning® KitDual Promoter with pCR™II vector (Invitrogen). The insert was confirmedby digestion with EcoRI and positive clones were sent to Genewiz forSanger sequencing.

Selection of LTR Target Sites, Whole Genome Sequencing andBioinformatics and Statistical Analysis.

We utilized Jack Lin's CRISPR/Cas9 gRNA finder tool for initialidentification of potential target sites within the LTR.

Plasmid Preparation.

DNA segment expressing LTR-A or LTR-B for pre-crRNA was cloned into thepX260 vector that contains the puromycin selection gene (Addgene,plasmid #42229). DNA segments expressing LTR-C or LTR-D for the chimericcrRNA-tracrRNA were cloned into the pX330 vector (Addgene, plasmid#42230). Both vectors contain a humanized Cas9 coding sequence driven bya CAG promoter and a gRNA expression cassette driven by a human U6promoter. The vectors were digested with BbsI and treated with AntarcticPhosphatase, and the linearized vector was purified with a Quicknucleotide removal kit (Qiagen). A pair of oligonucleotides for eachtargeting site (FIG. 14, AlphaDNA) was annealed, phosphorylated, andligated to the linearized vector. The gRNA expression cassette wassequenced with U6 sequencing primer (FIG. 14) in GENEWIZ. For pX330vectors, we designed a pair of universal PCR primers with overhangdigestion sites (FIG. 14) that can tease out the gRNA expressioncassette (U6-gRNA-crRNA-stem-tracrRNA) for direct transfection orsubcloning to other vectors.

Cell Culture.

TZM-bl reporter cell line from Dr John C. Kappes, Dr Xiaoyun Wu andTranzyme Inc, U1/Hiv-1 cell line from Dr. Thomas Folks and J-Lat fulllength clone from Dr. Eric Verdin were obtained through the NIH AIDSReagent Program, Division of AIDS, NIAID, NIH. CHME5/HIV fetal microgliacell line were generated as previously described. TZM-bl and CHME5 cellswere cultured in Dulbecco's minimal essential medium high glucosesupplemented with 10% heat-inactivated fetal bovine serum (FBS) and 1%penicillin/streptomycin. U1 and J-Lat cells were cultured in RPMI 1640containing 2.0 mM L-glutamine, 10% FBS and 1% penicillin/streptomycin.

Stable Cell Lines and Subcloning.

TZM-bl or CHME5/HIV cells were seeded in 6-well plates at 1.5×10⁵cells/well and transfected using Lipofectamine 2000 reagent (Invitrogen)with 1 μg of pX260 (for LTR-A and B) or 1 μg/0.1 μg of pX330/pX260 (forLTR-C and D) plasmids. Next day, cells were transferred into 100-mmdishes and incubated with growth medium containing 1 μg/ml of puromycin(Sigma). Two weeks later, surviving cell colonies were isolated usingcloning cylinders (Corning). U1 cells (1.5×10⁵) were electroporated with1 μg of DNA using 10 μl tip, 3×10 ms 1400 V impulses at The Neon™Transfection System (Invitrogen). Cells were selected with 0.5 μg/ml ofpuromycin for two weeks. The stable clones were subcultured using alimited dilution method in 96-well plates and single cell-derivedsubclones were maintained for further studies.

Immunocytochemistry and Western Blot.

The Cas9/gRNA stable expression TZM-bl cells were cultured in 8-wellchamber slides for 2 days and fixed for 10 min in 4%paraformaldehyde/PBS. After three rinses, the cells were treated with0.5% Triton X-100/PBS for 20 min and blocked in 10% donkey serum for 1h. Cells were incubated overnight at 4° C. with mouse anti-Flag M2primary antibody (1:500, Sigma). After rinsing three times, cells wereincubated for 1 h with donkey anti-mouse Alexa-Fluor-594 secondaryantibodies, and incubated with Hoechst 33258 for 5 min. After threerinses with PBS, the cells were coverslipped with anti-fading aqueousmounting media (Biomeda) and analyzed under a Leica DM16000Bfluorescence microscope.

TZM-bl cells cultured in 6-well plate were solubilized in 200 μl ofTriton X-100-based lysis buffer containing 20 mM Tris-HCl (pH 7.4), 1%Triton X-100, 5 mM ethylenediaminetetraacetic acid, 5 mM dithiothreitol,150 mM NaCl, 1 mM phenylmethylsulfonyl fluoride, lx nuclear extractionproteinase inhibitor cocktail (Cayman Chemical, Ann Arbor, Mich.), 1 mMsodium orthovanadate and 30 mM NaF. Cell lysates were rotated at 4° C.for 30 min. Nuclear and cellular debris was cleared by centrifugation at20,000 g for 20 min at 4° C. Equal amounts of lysate proteins (20 μg)were denatured by boiling for 5 min in sodium dodecyl sulphate (SDS)sample buffer, fractionated by SDS-polyacrylamide gel electrophoresis intris-glycine buffer, and transferred to nitrocellulose membrane(BioRad). The SeeBlue prestained standards (Invitrogen) were used as amolecular weight reference. Blots were blocked in 5% BSA/tris-bufferedsaline (pH 7.6) plus 0.1% Tween-20 (TBS-T) for 1 h and then incubatedovernight at 4° C. with mouse anti-Flag M2 monoclonal antibody (1:1000,Sigma) or mouse anti-GAPDH monoclonal antibody (1:3000, Santa CruzBiotechnology). After washing with TBS-T, the blots were incubated withIRDye 680LT-conjugated anti-mouse antibody for 1 h at room temperature.Membranes were scanned and analyzed using an Odyssey Infrared ImagingSystem (LI-COR Biosciences).

Firefly-Luciferase Assay.

Cells were lysed 24 h post-treatment using Passive Lysis Buffer(Promega) and assayed with a Luciferase Reporter Gene Assay kit(Promega) according to the protocol of the manufacturer. Luciferaseactivity was normalized to the number of cells determined by parallelMTT assay (Vybrant, Invitrogen).

p24 ELISA

After infection or reactivation, the HIV-1 viral load levels in thesupernatants were quantified by p24 Gag ELISA (Advanced BioScienceLaboratories, Inc) following the manufacturer's protocol. To assess thecell viability upon treatments, MTT assay was performed in parallelaccording to the manufacturer's protocol (Vybrant, Invitrogen).

EGFP Flow Cytometry.

Cells were trypsinized, washed with PBS and fixed in 2% paraformaldehydefor 10 min at room temperature, then washed twice with PBS and analyzedusing a Guava EasyCyte Mini flow cytometer (Guava Technologies).

Hiv-1 Reporter Virus Preparation and Infections.

HEK293T cells were transfected using Lipofectamine 2000 reagent(Invitrogen) with pNL4-3-ΔE-EGFP, SF162 and JRFL (NIH AIDS Research andReference Reagent Program). For pseudotyped pNL4-3-ΔE-EGFP, the VSVGvector was cotransfected. After 48 h, the supernatant was collected,0.45 μm filtered and tittered in HeLa cells using expressed EGFP as aninfection marker. For viral infection, stable Cas9/gRNA TZM-bl cellswere incubated 2 h with a diluted viral stock, and washed twice withPBS. At 2 and 4 days post-infection, cells were collected, fixed andanalyzed by flow cytometry for EGFP expression, or genomic DNApurification was performed for PCR and whole genome sequencing.

Genomic DNA Purification, PCR, TA-Cloning and Sanger Sequencing.

Genomic DNA was isolated from cells using an ArchivePure DNA cell/tissuepurification kit (SPRIME) according to the protocol recommended by themanufacturer. One hundred ng of extracted DNA were subjected to PCRusing a high-fidelity FailSafe PCR kit (Epicentre) using primers listedin FIG. 14. Three steps of standard PCR were carried out for 30 cycleswith 55° C. annealing and 72° C. extension. The products were resolvedin 2% agarose gel. The bands of interest were gel-purified and clonedinto pCRII T-A vector (Invitrogen), and the nucleotide sequence ofindividual clones was determined by sequencing at Genewiz usinguniversal T7 and/or SP6 primers.

Conventional and Real-Time Reverse Transcription (RT)-PCR.

For total RNA extraction, cells were processed with an RNeasy Mini kit(Qiagen) as per manufacturer's instructions. The potentially residualgenomic DNA was removed through on-column DNase digestion with anRNase-Free DNase Set (Qiagen). One μg of RNA for each sample wasreversely transcribed into cDNAs using random hexanucleotide primerswith a High Capacity cDNA Reverse Transcription Kit (Invitrogen, GrandIsland, N.Y.). Conventional PCR was performed using a standard protocol.Quantitative PCR (qPCR) analyses were carried out in a LightCycler480(Roche) using an SYBR® Green PCR Master Mix Kit (Applied Biosystems).The RT reactions were diluted to 5 ng of total RNA per micro-liter ofreactions and 2 μl was used in a 20-μl PCR reaction. For qPCR analysisof HIV-1 proviruses, 50 ng of genomic DNA were used. The primers weresynthesized in AlphaDNA and shown in FIG. 14. The primers for humanhousekeeping genes GAPDH and RPL13A were obtained from RealTimePrimers(Elkins Park, Pa.). Each sample was tested in triplicate. Cyclethreshold (Ct) values were obtained graphically for the target genes andhouse-keeping genes. The difference in Ct values between thehousekeeping gene and target gene was represented as ΔCt values. TheΔΔCt values were obtained by subtracting the ΔCt values of controlsamples from those of experimental samples. Relative fold or percentagechange was calculated as 2-ΔΔCt. In some cases, absolute quantificationwas performed using the pNL4-3-ΔE-EGFP plasmid spiked in human genomicDNA as a standard. The number of HIV-1 viral copies was calculated basedon standard curve after normalization with housekeeping gene.

GenomeWalker Link PCR and Long-Range PCR.

The integration sites of HIV-1 in host cells were identified using aLenti-X™ Integration Site Analysis kit (Clontech) following themanufacturer's instruction. Briefly, high quality genomic DNAs wereextracted from U1 cells using a NucleoSpin Tissue kit (Clontech). Toconstruct the viral integration libraries, each genomic DNA sample wasdigested with blunt-end-generating digestion enzymes Dra I, Ssp I orHpaI separately overnight at 37° C. The digestion efficiency wasverified by electrophoresis on 0.6% agarose. The digested DNA waspurified using a NucleoSpin Gel and PCR Clean-Up kit followed byligation of the digested genomic DNA fragments to GenomeWalker™ Adaptorat 16° C. overnight. The ligation reaction was stopped by incubation at70° C. for 5 min and diluted 5 times with TE buffer. The primary PCR wasperformed on the DNA segments with adaptor primer 1 (AP1) andLTR-specific primer 1 (LSP1) using Advantage 2 Polymerase Mix followedby a secondary (nested) PCR using AP2 and LSP2 primers (FIG. 14). Thesecondary PCR products were separated on 1.5% ethidiumbromide-containing agarose gel. The major bands were gel-purified andcloned into pCRII T-A vector (Invitrogen), and the nucleotide sequenceof individual clones was determined by sequencing at Genewiz usinguniversal T7 and SP6 primers. The sequence reads were analyzed by NCBIBLAST searching. Two integration sites of HIV-1 in U1 cells wereidentified in chromosomes X and 2. A pair of primers covering eachintegration site (FIG. 14) was synthesized in AlphaDNA. Long-range PCRusing the U1 genomic DNA was performed with a Phusion High-Fidelity PCRkit (New England Biolabs) following the manufacturer's protocol. The PCRproducts were visualized on 1% agarose gel and validated by Sangersequencing.

Surveyor Assay.

The presence of mutations in PCR products was tested using a SURVEYORMutation Detection Kit (Transgenomic) according to the protocol of themanufacturer. Briefly heterogeneous PCR products were denatured for 10min in 95° C. and hybridized by gradual cooling using a thermocycler.Next 300 ng of hybridized DNA (9 ul) was subjected to digestion with0.25 μl of SURVEYOR Nuclease in the presence of 0.25 μl SURVEYOREnhancer S and 15 mM MgCl₂ for 4 h at 42° C. Then Stop Solution wasadded and samples were resolved in 2% agarose gel together with equalamounts of undigested PCR products.

Some PCR products were used for restriction fragment length polymorphismanalysis. Equal amount of PCR products were digested with BsaJI.Digested DNA was separated on an ethidium bromide-contained agarose gel(2%). For sequencing, PCR products were cloned using a TA Cloning® KitDual Promoter with pCRTMII vector (Invitrogen). The insert was confirmedby digestion with EcoRI and positive clones were sent to Genwiz forSanger sequencing.

Selection of LTR Target Sites and Prediction of Potential Off-TargetSites.

For initial studies, we obtained the LTR promoter sequence (−411 to −10)of the integrated lentiviral LTR-luciferase reporter by TA-cloningsequencing of PCR products from the genome of human TZM-bl cells becauseof potential mutation of LTR during passaging. This promoter sequencehas 100% match to the 5′-LTR of pHR′-CMV-LacZ lentiviral vector(AF105229). Thus, sense and antisense sequences of the full-length pHR′5′-LTR (634 bp) were utilized to search for Cas9/gRNA target sitescontaining 20 bp gRNA targeting sequence plus the PAM sequence (NRG)using Jack Lin's CRISPR/Cas9 gRNA finder tool. The number of potentialoff-targets with exact match was predicted by blasting each gRNAtargeting sequence plus NRG (AGG, TGG, GGG and CGG; AAG, TAG, GAG, CAG)against all available human genomic and transcript sequences using theNCBI/blastn suite with E-value cutoff 1,000 and word size 7. Afterpressing Control+F, copy/paste the target sequence (1-23 through 9-23nucleotides) and find the number of genomic targets with 100% match tothe target sequence. The number of off-targets for each search wasdivided by 3 because of repeated genome library.

Whole Genome Sequencing and Bioinformatics Analysis.

The control subclone C1 and experimental subclone AB7 of TZM-bl cellswere validated for target cut efficiency and functional suppression ofthe LTR-luciferase reporter. The genomic DNA was isolated withNucleoSpin Tissue kit (Clontech). The DNA samples were submitted to theNextGen sequencing facility at Temple University Fox Chase CancerCenter. Duplicated genomic DNA libraries were prepared from eachsubclone using a NEBNext Ultra DNA Library Prep Kit for IIlumina (NewEngland Biolab) following the manufacturer's instruction. All librarieswere sequenced with paired-end 141-bp reads in two IIlumina Rapid Runflowcells on HiSeq 2500 instrument (IIlumina). Demultiplexed read datafrom the sequenced libraries were sent to AccuraScience, LLC forprofessional bioinformatics analysis. Briefly, the raw reads were mappedagainst human genome (hg19) and HIV-1 genome by using Bowtie2. A genomicanalysis toolkit (GATK, version 2.8.1) was used for the duplicated readremoval, local alignment, base quality recalibration and indel calling.The confidence scores 10 and 30 were the thresholds for low quality(LowQual) and high confidence calling (PASS). The potential off-targetsites of LTR-A and LTR-B with various mismatches were predicted byNCBI/blastn suite as described above and by a CRISPR Design Tool. Allthe potential gRNA target sites (FIG. 15) were used to map the ±300 bpregions around each indel identified by GATK. The locations of theoverlapped regions in the human genome and HIV-1 genome were comparedbetween the control C1 and experimental AB7.

Statistical Analysis.

The quantitative data represented mean±standard deviation from 3-5independent experiments, and were evaluated by Student's t-test or ANOVAand Newman-Keuls multiple comparison test. A p value that is <0.05 or0.01 was considered as a statistically significant difference.

Example 2: Cas9/LTR-gRNA Suppresses HIV-1 Reporter Virus Production inCHME5 Microglial Cells Latently Infected with HIV-1

We assessed the ability of HIV-1-directed guide RNAs (gRNAs) to abrogateLTR transcriptional activity and eradicate proviral DNA from the genomesof latently-infected myeloid cells that serve as HIV-1 reservoirs in thebrain, a particularly intractable target population. Our strategy wasfocused on targeting the HIV-1 LTR promoter U3 region. By bioinformaticscreening and efficiency/off-target prediction, we identified four gRNAtargets (protospacers; LTRs A-D) that avoid conserved transcriptionfactor binding sites, minimizing the likelihood of altering host geneexpression (FIGS. 5 and 13). We inserted DNA fragments complementary togRNAs A-D into a humanized Cas9 expression vector (A/B in pX260; C/D inpX330) and tested their individual and combined abilities to alter theintegrated HIV-1 genome activity. We first utilized the microglial cellline CHME5, which harbors integrated copies of a single round HIV-1vector that includes the 5′ and 3′ LTRs, and a gene encoding an enhancedgreen fluorescent protein (EGFP) reporter replacing Gag(pNL4-3-ΔGag-d2EGFP). Treating CHME5 cells with trichostatin A (TSA), ahistone deacetylase inhibitor, reactivates transcription from themajority of the integrated proviruses and leads to expression of EGFPand the remaining HIV-1 proteome. Expressing of gRNAs plus Cas9 markedlydecreased the fraction of TSA-induced EGFP-positive CHME5 cells (FIGS.1A and 6). We detected insertion/deletion gene mutations (indels) forLTRs A-D (FIGS. 1B and 6B) using a Cel I nuclease-basedheteroduplex-specific SURVEYOR assay. Similarly, expressing gRNAstargeting LTRs C and D in HeLa-derived TZM-bl cells, that contain stablyincorporated HIV-1 LTR copies driving a firefly-luciferase reportergene, suppressed viral promoter activity (FIG. 7A), and elicited indelswithin the LTR U3 region (FIG. 7B-D) demonstrated by SURVEYOR and Sangersequencing. Moreover, the combined expression of LTR C/D-targeting gRNAsin these cells caused excision of the predicted 302-bp viral DNAsequence, and emergence of the residual 194-bp fragment (FIG. 7E-F).

Multiplex expression of LTR-A/B gRNAs in mixed clonal CHME5 cells causeddeletion of a 190-bp fragment between A and B target sites and led toindels to various extents (FIG. 1C-D). Among >20 puromycin-selectedstable subclones, we found cell populations with complete blockade ofTSA-induced HIV-1 proviral reactivation determined by flow cytometry forEGFP (FIG. 1E). PCR-based analysis for EGFP and HIV-1 Rev responseelement (RRE) in the proviral genome validated the eradication of HIV-1genome (FIG. 1F, G). Furthermore, sequencing of the PCR productsrevealed the entire 5′-3′ LTR-spanning viral genome was deleted,yielding a 351-bp fragment via a 190-bp excision between cleavage sitesA and B (FIGS. 1G and 8), and a 682-bp fragment with a 175-bp insertionand a 27-bp deletion at the LTR-A and -B sites respectively (FIG. 8C).The residual HIV-1 genome (FIG. 1F-H) may reflect the presence of traceCas9/gRNA-negative cells. These results indicate that LTR-targetingCas9/gRNAs A/B eradicates the HIV-1 genome and blocks its reactivationin latently infected microglial cells.

Example 3: Cas9/LTR-gRNA Efficiently Eradicates Latent HIV-1 Virus fromU1 Monocytic Cells

The promonocytic U-937 cell subclone U1, an HIV-1 latency model forinfected perivascular macrophages and monocytes, is chronicallyHIV-1-infected and exhibits low level constitutive viral gene expressionand replication. GenomeWalker mapping detected two integrated proviralDNA copies at chromosomes Xp11-4 (FIG. 2A) and 2p21 (FIG. 9A) in U1cells. A 9935-bp DNA fragment representing the entire 9709-bp proviralHIV-1 DNA plus a flanking 226-bp X-chromosome-derived sequence (FIG.2A), and a 10176-bp fragment containing 9709-bp HIV-1 genome plus itsflanking 2-chromosome-derived 467-bp (FIG. 9A, B) were identified by thelong-range PCR analysis of the parental control or empty-vector (U6-CAG)U1 cells. The 226-bp and 467-bp fragments represent the predictedsegment from the other copy of chromosome X and 2 respectively, whichlacked the integrated proviral DNA. In U1 cells expressing LTR-A/B gRNAsand Cas9, we found two additional DNA fragments of 833 and 670 bp inchromosome X and one additional 1102-bp fragment in chromosome 2. Thus,gRNAs A/B enabled Cas9 to excise the HIV-1 5′-3′ LTR-spanning viralgenome segment in both chromosomes. The 833-bp fragment includes theexpected 226-bp from the host genome and a 607-bp viral LTR sequencewith a 27-bp deletion around the LTR-A site (FIG. 2A-B). The 670-bpfragment encompassed a 226-bp host sequence and residual 444-bp viralLTR sequence after 190-bp fragment excision (FIG. 1D), caused bygRNAs-A/B-guided cleavage at both LTRs (FIG. 2A). The additionalfragments did not emerge via circular LTR integration, because it wasabsent in the parental U1 cells, and such circular LTR viral genomeconfiguration occurs immediately after HIV-1 infection but is shortlived and intolerant to repeated passaging. These cells exhibitedsubstantially decreased HIV-1 viral load, shown by the functional p24ELISA replication assay (FIG. 2C) and real-time PCR analysis (FIG. 9C,D). The detectable but low residual viral load and reactivation mayresult from cell population heterogeneity and/or incomplete genomeediting. We also validated the ablation of HIV-1 genome by Cas9/LTR-A/BgRNAs in latently infected J-Lat T cells harboring integratedHIV-R7/E-/EGFP using flow cytometry analysis, SURVEYOR assay and PCRgenotyping (FIG. 10), supporting the results of previous reports onHIV-1 proviral deletion in Jurkat T cells by Cas9/gRNA and ZFN. Takentogether, our results suggest that the multiplex LTR-gRNAs/Cas9 systemefficiently suppress HIV-1 replication and reactivation in latentlyHIV-1-infected “reservoir” (microglial, monocytic and T) cells typicalof human latent HIV-1 infection, and in TZM-bl cells highly sensitivefor detecting HIV-1 transcription and reactivation. Single or multiplexgRNAs targeting 5′- and 3′-LTRs effectively eradicated the entire HIV-1genome.

Example 4: Stable Expression of Cas9 Plus LTR-A/B Vaccinates TZM-blCells Against New HIV-1 Virus Infection

We next tested whether combined Cas9/LTR gRNAs can immunize cellsagainst HIV-1 infection using stable Cas9/gRNAs-A and -B-expressingTZM-bl-based clones (FIG. 3A). Two of 7 puromycin-selected subclonesexhibited efficient excision of the 190-bp LTR-A/B site-spanning DNAfragment (FIG. 3B). However, the remaining 5 subclones exhibited noexcision (FIG. 3B) and no indel mutations as verified by Sangersequencing. PCR genotyping using primers targeting Cas9 and U6-LTRshowed that none of these ineffective subclones retained the integratedcopies of Cas9/LTR-A/B gRNA expression cassettes. (FIG. 11A, B). As aresult, no expression of full-length Cas9 was detected (FIG. 11C, D).The long-term expression of Cas9/LTR-A/B gRNAs did not adversely affectcell growth or viability, suggesting a low occurrence of off-targetinterference with the host genome or Cas9-induced toxicity in thismodel. We assessed de novo HIV-1 replication by infecting cells with theVSVG-pseudotyped pNL4-3-ΔE-EGFP reporter virus, with EGFP-positivity byflow cytometry indicating HIV-1 replication. Unlike the control U6-CAGcells, the cells stably expressing Cas9/gRNAs LTRs-A/B failed to supportHIV-1 replication at 2 d post infection, indicating that they wereimmunized effectively against new HIV-1 infection (FIG. 3C-D). A similarimmunity against HIV-1 was observed in Cas/LTR-A/B gRNA expressing cellsinfected with native T-tropic X4 strain pNL4-3-ΔE-EGFP reporter virus(FIG. 12A) or native M-tropic R5 strains such as SF162 and JRFL (FIG.12B-D).

Example 5: Off-Target Effects of Cas9/LTR-A/B on Human Genome

The appeal of Cas9/gRNA as an interventional approach rests on itshighly specific on-target indel-producing cleavage, but multiplex gRNAscould potentially cause host genome mutagenesis and chromosomaldisorders, cytotoxicity, genotoxicity, or oncogenesis. Fairly lowviral-human genome homology reduces this risk, but the human genomecontains numerous endogenous retroviral genomes that are potentiallysusceptible to HIV-1-directed gRNAs. Therefore, we assessed off-targeteffects of selected HIV-1 LTR gRNAs on the human genome. Because the12-14-bp seed sequence nearest the protospacer-adjacent motif (PAM)region (NGG) is critical for cleavage specificity, we searched >14-bpseed+NGG, and found no off-target candidate sites by LTR gRNAs A-D (FIG.13). It is not surprising that progressively shorter gRNA segmentsyielded increasing off-target cleavage sites 100% matched tocorresponding on-target sequences (i.e., NGG+13 bp yielded 6, 0, 2 and 9off-target sites, respectively, whereas NGG+12 bp yielded 16, 5, 16 and29; FIG. 13). From human genomic DNA we obtained a 500-800-bp sequencecovering one of predicted off-target sites using high-fidelity PCR, andanalyzed the potential mutations by SURVEYOR and Sanger sequencing. Wefound no mutations (see representative off-target sites #1, 5 and 6 inTZM-bl and U1 cells; FIG. 4A).

To assess risk of off-target effects comprehensively, we performed wholegenome sequencing (WGS) using the stable Cas9/gRNA A/B-expressing andcontrol U6-CAG TZM-bl cells (FIG. 4B-D). We identified 676,105 indels,using a genome analysis toolkit (GATK, v.2.8.1) with human (hg19) andHIV-1 genomes as reference sequences. Among the indels, 24% occurred inthe U6-CAG control, 26% in LTR-A/B subclone, and 50% in both (FIG. 4B).Such substantial inter-sample indel-calling discrepancy suggests theprobable off-target effects, but most likely results from its limitedconfidence, limited WGS coverage (15-30×), and cellular heterogeneity.GATK reported only confidently-identified indels: some found in theU6-CAG control but not in the LTR-A/B subclone, and others in theLTR-A/B but not in the U6-CAG. We expected abundant missing indel callsfor both samples due to the limited WGS coverage. Such limitedindel-calling confidence also implies the possibility of falsenegatives: missed indels occurring in LTR-A/B but not U6-CAG controls.Cellular heterogeneity may reflect variability of Cas9/gRNA editingefficiency and effects of passaging. Therefore, we tested whether eachindel was LTR-A/B gRNA-induced, by analyzing ±300 bp flanking each indelagainst LTRs-A/-B-targeted sites of the HIV-1 genome andpredicted/potential gRNA off-target sites of the host genome (FIG. 15).For sequences 100% matched to one containing the seed (12-bp) plus NRG,we identified only 8 overlapped regions of 92 potential off-target sitesagainst 676,105 indels: 6 indels occurring in both samples, and 2 onlyin the U6-CAG control (FIG. 4C, D). We also identified 2 indels on HIV-1LTR that occurred only in the LTR-A/B subclone but, as expected, not inthe U6-CAG control (FIG. 4C). The results suggest that LTR-A/B gRNAsinduce the indicated on-target indels, but no off-target indels,consistent with prior findings using deep sequencing of PCR productscovering predicted/potential off-target site.

Our combined approaches minimized off-target effects while achievinghigh efficiency and complete ablation of the genomically integratedHIV-1 provirus. In addition to an extremely low homology between theforeign viral genome and host cellular genome including endogenousretroviral DNA, the key design attributes in our study included:bioinformatic screening using the strictest 12-bp+NGG target-selectioncriteria to exclude off-target human transcriptome or (even rarely)untranslated-genomic sites; avoiding transcription factor binding siteswithin the HIV-1 LTR promoter (potentially conserved in the hostgenome); selection of LTR-A- and -B-directed, 30-bp gRNAs and alsopre-crRNA system reflecting the original bacterial immune mechanism toenhance specificity/efficiency vs. 20-bp gRNA-, chimericcrRNA-tracRNA-based system; and WGS, Sanger sequencing and SURVEYORassay, to identify and exclude potential off-target effects. Indeed, theuse of newly developed Cas9 double-nicking and RNA-guided FokI nucleasemay further assist identification of new targets within the variousconserved regions of HIV-1 with reduced off-target effects.

Our results show that the HIV-1 Cas9/gRNA system has the ability totarget more than one copy of the LTR, which are positioned on differentchromosomes, suggesting that this genome editing system can alter theDNA sequence of HIV-1 in latently infected patient's cells harboringmultiple proviral DNAs. To further ensure high editing efficacy andconsistency of our technology, one may consider the most stable regionof HIV-1 genome as a target to eradicate HIV-1 in patient samples, whichmay not harbor only one strain of HIV-1. Alternatively, one may developpersonalized treatment modalities based on the data from deep sequencingof the patient-derived viral genome prior to engineering therapeuticCas9/gRNA molecules.

Our results also demonstrate that Cas9/gRNA genome editing can be usedto immunize cells against HIV-1 infection. The preventative vaccinationis independent of HIV-1 strain's diversity because the system targetsgenomic sequences regardless of how the viruses enter the infectedcells. The preexistence of the Cas9/gRNA system in cells led to a rapidelimination of the new HIV-1 before it integrates into the host genome.One may explore various systems for delivery of Cas9/LTR-gRNA forimmunizing high-risk subjects, e.g., gene therapies (viral vector andnanoparticle) and transplantation of autologous Cas9/gRNA-modified bonemarrow stem/progenitor cells or inducible pluripotent stem cells foreradicating HIV-1 infection.

Here, we demonstrated the high specificity of Cas9/gRNAs in editingHIV-1 target genome. Results from subclone data revealed the strictdependence of genome editing on the presence of both Cas9 and gRNA.Moreover, only one nucleotide mismatch in the designed gRNA target willdisable the editing potency. In addition, all of our 4 designed LTRgRNAs worked well with different cell lines, indicating that the editingis more efficient in the HIV-1 genome than the host cellular genome,wherein not all designed gRNAs are functional, which may be due todifferent epigenetic regulation, variable genome accessibility, or otherreasons. Given the ease and rapidity of Cas9/gRNA development, even ifHIV-1 mutations confer resistance to one Cas9/gRNA-based therapy, asdescribed above, HIV-1 variants can be genotyped to enable anotherpersonalized therapy for individual patients.

A number of embodiments of the invention have been described.Nevertheless, it will be understood that various modifications may bemade without departing from the spirit and scope of the invention.Accordingly, other embodiments are within the scope of the followingclaims.

What is claimed is:
 1. A method of preventing transmission of aretrovirus from a mother to her offspring, including the steps of:treating the mother's host cells with a composition comprising aClustered Regularly Interspaced Short Palindromic Repeat(CRISPR)-associated endonuclease, and two or more different guide RNAs(gRNAs), wherein each of the at least two gRNAs is complementary to adifferent target nucleic acid sequence in a long terminal repeat (LTR)of the proviral DNA; and preventing transmission of the proviral DNA tothe offspring.
 2. The method of claim 1, wherein said treating stepfurther includes the steps of cleaving a double strand of the proviralDNA at a first target protospacer sequence with the CRISPR-associatedendonuclease; cleaving a double strand of the proviral DNA at a secondtarget protospacer sequence with the CRISPR-associated endonuclease;excising an entire proviral genome of the proviral DNA; and eradicatingthe proviral DNA from the host cell.
 3. The method of claim 1, whereinsaid step of treating the host cell includes the steps of: exposing thehost cell to a composition including an isolated nucleic acid encodingthe CRISPR-associated endonuclease; an isolated nucleic acid sequenceencoding a first gRNA having a first spacer sequence that iscomplementary to a first target protospacer sequence in a proviral DNA;and an isolated nucleic acid encoding a second gRNA having a secondspacer sequence that is complementary to a second target protospacersequence in the proviral DNA; expressing in the host cell theCRISPR-associated endonuclease, the first gRNA, and the second gRNA;assembling, in the host cell, a first gene editing complex including theCRISPR-associated endonuclease and the first gRNA; and a second geneediting complex including the CRISPR-associated endonuclease and thesecond gRNA; directing the first gene editing complex to the firsttarget protospacer sequence by complementary base pairing between thefirst spacer sequence and the first target protospacer sequence; anddirecting the second gene editing complex to the second targetprotospacer sequence by complementary base pairing between the secondspacer sequence and the second target protospacer sequence.
 4. Themethod of claim 2, wherein at least one of the first target protospacersequence and the second target protospacer sequence is situated withinthe U3 region of the LTR.
 5. The method of claim 1, wherein theretrovirus is selected from the group consisting of humanimmunodeficiency virus-1 (HIV-1), HIV-2, simian immunodeficiency virus(SIV), feline immunodeficiency virus (FIV), bovine immunodeficiencyvirus (BIV), equine infectious anemia virus (EIAV), and caprinearthritis/encephalitis virus (CAEV).
 6. The method of claim 1, whereinthe CRISPR-associated endonuclease is Cas9 or a human-optimized Cas9. 7.The method of claim 3, wherein the isolated nucleic acids encoding aCRISPR-associated endonuclease, the first gRNA, and the second gRNA, areencoded in at least one expression vector.
 8. The method of claim 7,wherein the at least one expression vector is selected from the groupconsisting of a plasmid vector, a lentiviral vector, an adenoviralvector, and an adeno-associated virus vector.
 9. The method of claim 1,wherein at least one of the gRNAs comprises a CRISPR RNA (crRNA) and atrans-activated small RNA (tracrRNA), which are expressed as separatenucleic acids.
 10. The method of claim 1, wherein at least one of thegRNAs is engineered as an artificial fusion small guide RNA (sgRNA)comprised of a crRNA and a tracrRNA.
 11. The method of claim 1, furtherincluding the step of immunizing the host cell against new retroviralinfection.
 12. The method of claim 1, wherein the host cell latentlyinfected with a retrovirus is chosen from the group consisting of a CD4+T cell, a macrophage, a monocyte, a gut associated lymphoid cell, amicroglial cell, and an astrocyte.
 13. The method of claim 1, whereinthe mother is pregnant and wherein said preventing step is furtherdefined as preventing transmission of the proviral DNA to the offspringin utero.
 14. The method of claim 1, wherein the mother is lactating andwherein said preventing step is further defined as preventingtransmission of the proviral DNA to the offspring through breast milk.15. The method of claim 1, wherein said treating step occurs during atime period chosen from the group consisting of prenatally, perinatally,postnatally, and combinations thereof.
 16. The method of claim 1,further including the step of treating an infant's host cells with acomposition comprising a Clustered Regularly Interspaced ShortPalindromic Repeat (CRISPR)-associated endonuclease, and two or moredifferent guide RNAs (gRNAs), wherein each of the at least two gRNAs iscomplementary to a different target nucleic acid sequence in a longterminal repeat (LTR) of the proviral DNA after delivery.